Notes from a Software Architect, on a small island: April 2008

Monday, April 21, 2008

When do you measure software complexity?

In my day job, I often have to review a number of software artefacts including software architectures, designs and code which form part of a large system. As the system is to last for many years (it is not uncommon for some of the components of the system to last for more than 10 years), I have always been keen to ensure that the artefacts are understandable by more than the creator and the reviewer, and are capable of being maintained for many years. One measure I have considered to assist this is software complexity particularly as one of the big issues in software maintenance is that as code evolves from its original form and is modified by more engineers, the complexity of the code is likely to increase unless it is actively monitored.

While assessing the complexity and understandability of architecture and design still requires some manual effort (and based on experience), one measure I have used for measuring code is the cyclomatic complexity metric which has been around for over 30 years and is used to calculate the complexity of a program (actually it is a method or function of a program) by measuring the number of linearly independent paths through a program's source code. The advantage of this metric is that it is applicable to many languages and relatively easy to calculate. There are recommended values which should be used as the thresholds with any module over 20 being considered for refactoring.

As with any metric, the key is when the metric is applied. I see that there are several choices:

Calculate the metric continually as you code. The advantage of this is that you are immediately aware if you are exceeding the agreed thresholds and can therefore not exceed the agreed complexity measure. However, this approach becomes very invasive as code is edited.
Calculate the metric as the code is compiled with code exceeding a particular threshold failing to compile (and hence failing to build the application). This is a simple way of ensuring that the complexity is not exceeded but can become frustrating if you are trying to come up with a quick fix to solve a problem.
Calculate the metric when you check-in the code to a configuration management system with any modules which exceed the agreed threshold being rejected. This is less invasive than calculating during the coding or compiling (assuming that you don't check anything in to you configuration management system that doesn't compile) and can be integrated with a number of other software metrics which can be calculated from the configuration management system.
Calculate the metric as part of the testing phase. This is probably the most common approach but is also the least appropriate. Depending on the testing strategy, it is possible that the code will already have been 'proven' to work so there is a degree of reluctance to change 'working' code.

Once the metric has been calculated, and some transgressions are identified, the code needs to be examined to identify the cause for the complexity. In many cases this is due to a poorly abstracted method or large monolithic code but there are some instances where the natural coding style will lead to the threshold being exceeded. This is particularly true where a switch statement is used. There needs to be a (manual) judgement regarding whether the code should be refactored bearing in mind that one of the benefits of examining complexity is identifying the amount of testing that may be required. Refactoring a method with a complexity measure of 20 could easily be reduced to two packages each with a measure of 5. which will be far easier to test particularly if you have a strategy of 100% code coverage.

Tuesday, April 15, 2008

Getting the badge

I have just come across Certified Open, an organisation which is trying to help measure and encourage competition in the provision of hardware, software and services. The aim, as I understand it, is to allow purchasers to evaluate various products and determine which ones will result in commercial lock-in, which may or may not be desirable depending on the intended use and evolution of the product

The assessment has a set of questions with which the product can be scored. Determining on the final score, the product will be awarded Gold, Silver or Bronze assuming a score of 50% or more is achieved. It would appear that each question has a number of options which will have an appropriate weighting assigned to help develop the final score. Now I have no issue with allowing a vendor performing an assessment against his product as it will no doubt assist in identifying areas of weakness which may merit further work. What I am unsure about at this stage, is if there is also an independent review of the assessment as some of the questions are clearly subjective and it is easy to look at your own product in a good light which may result in a good score. The assessment also says nothing about how suitable the product is if it is used as part of an overall system or solution and how the product may be evolved. Interoperability is also key; just conforming to a well defined standard may not be enough if there are various versions of a standard.

The approach to assessment is vary similar to that adopted by US Navy for the Open Architecture Assessment Tool (OAAT) in evaluating products as part of a naval defence system. However, this assessment covers both the technical aspects of the product together with the management and evolution of the product. Now this assessment is clearly aimed at a specific market which is looking to protect its investment in complex products.

I wonder how useful these assessments can be.

Some thoughts:

Should the assessments be performed by an independent authority?
Will the assessments encourage speculative product development if 'getting the badge' is seen as a key product discriminator in a particular market?
Are there any other examples of product assessments which have resulted in products being developed specifically to meet assessment criteria?

Any comments?

Wednesday, April 2, 2008

SPA2008 - some final thoughts

It is now 2 weeks since the end of the SPA 2008 Conference and I have finally finished writing my notes. (Yippee!) So what are my final thoughts and what I am going to take away for exploring in the next few weeks and months?

I thought the conference was excellently run with a good mix of software practitioners attending. I was spoilt for choice in selecting the sessions to attend but there are some good notes appearing which summarise the sessions I didn't attend. There are also a number of articles appearing on various blogs. The dialogue during and between sessions was also stimulating and I hope to continue this when time permits.

Technically, I am particularly interested in following up the following topics:

Python, Django and other frameworks

Erlang, particularly as part of a multi-language development probably with C, C++ or Java

Further development of metric dashboards (which is work in progress ) now that I have got some further input to consider

Look more at DSM, DSL's and domain specific code generation

Decaying architecture and legacy code bases, and attempting to detect when to intervene

I also learnt a lot about me. The conference confirmed that generally, the work I do is pretty well in line with what the rest of the software world in the UK is currently doing (or tyring to do!). There are differences but these appeared to be related to the domains, size of developments and team dynamics.

Describing Software Architectures

My last session at the SPA 2008 Conference was a workshop to explore the approach to describing a software architecture.

Nick Rozanski, Eoin Woods and Andy Longshaw asserted that any architecture requires a lot of selling with many different stakeholders and customers interested in the architecture. Communicating the architecture is obviously key and a group session explored what should be produced in an architectural description document. The varied experience of the groups produced a diverse set of content, with the more experienced architects recognising the importance of a clear baseline of requirements from which the architecture can be derived and confirming an understanding of how the delivered system is to be used. Clearly, the project size and life cycle approach has an impact on the production of the architecture; projects following a traditional waterfall life cycle (tending to be part of larger programmes) placed a increased emphasis on the requirements whereas the more agile projects placed much less emphasis on this and were keen to start producing some tangible software. This discrepancy also appeared in another SPA 2008 session that I attended.

There was also an interesting exploration in the tools and techniques used to produce the content of an architecture document. A recurring message was that the document must be in a format suitable for all of the readership, so clear preference for using a variety of standard office based tools such as MS-Word, Powerpoint or web pages on an Intranet rather than specific architecture tools e.g. modelling tools for UML which may not be available to all readers (one solution to mitigate this would be to ensure that such content is exported into one of the more accessible formats e.g. HTML). There was the recommendation that the tools used in the architecture should form part of a integrated toolset with good traceability between the requirements, architecture and design being available. The availability and use of such integrated toolsets appeared to be limited to larger programmes, particularly where the software was only a part of the eventual solution.

Once an architecture is produced, it needs to be publicised and communicated to everyone who is impacted or interested in the architecture. While there was general agreement that the architecture must be peer-reviewed before this wider communication is performed, the approach to communicating (or socialising, as Andy called it) the architecture varied from simple 1-1 chats to major presentations. What was clear was that the architecture must continue to live and that it needs to be actively maintained, remembering that the original approach of selling the architecture also applies to any subsequent changes.

Overall, I felt the workshop confirmed that the approach to software architecture that I follow continues to be appropriate and picked up some good ideas for trying on a future architectures.

A summary of the output produced by the workshop is available on the SPA 2008 Conference wiki.

Tuesday, April 1, 2008

Automated Testing Experience

Keith Braithwaite presented his recent experience of automated testing, particularly using Test Driven Development, at SPA 2008 Conference.

The audience, a mixture of youth and experience, had already some good experience of automated testing tools such as Xunit, Selenium, Cruise Control, jMock, dbUnit and Build-o-Matic (funny how all of these tools are open source).

Keith outlined some of the problems that he saw with manual testing such as:

Takes a long time...

Not consistently applied

Not particularly effective

Requires scheduling of people

Keith advocated automated testing as a better way with a clear view that 'Checked Examples' need to be performed automatically. Before describing his approach he went back to the basis of software engineering with regards how the system requirements were specified -natural language which is often vague and ambiguous. He advocated that unless the customer talked the same language as the engineers, then there would always be problems in building systems. One of the major problems in systems are specifying the boundaries or constraints of the system to be developed; writing these rules precisely was often very difficult with multiple exceptions to the norm. However, the customer could often give examples of 'what the system should do' - these weren't the rules but were sufficient for the reference results to be manually calculated by hand. There was no guarantee that the reference results were correct but by ensuring that there were multiple contributors to the production of this data, the probability of erroneous data was reduced.

With reference data available, a test framework was developed using the FIT library which was then used to automatically compare the results from the system with the reference results. At the end there were several hundred scenarios (or examples) used, with the test data being available before the features were developed, thereby facilitating the incremental development. Keith stated that the approach using reference data and automatic comparison of the system against the reference data detected problems early and also revealed issues with the design (if it couldn't be instrumented, then the design was questionable).

Keith stated that the delivered system was defect-free with no failures being reported.

Is Defect-Free possible?

Developers make mistakes and errors, ...so

The system contains defects,... so

The user experiences failures

So, if the user doesn't experience failures, does this mean the system is without defects? Probably not, but this is hard to verify. I would agree with this by saying that it is very difficult to claim that any development is defect free, only that the defects haven't revealed themselves during the testing and normal (and presumably intended) operation of the system. It also depends on the impact that the failure has, as clearly some systems have a more severe impact than others. By adopting Keith's approach of developing reference data with the customer, the potential for misunderstanding the system before delivery is reduced.

Keith summarised by saying that the tests weren't really tests at all, merely gauges of how the system is working for the users. The reference data were 'checked examples' which were a very powerful way of ensuring the customer and developers worked together.