Notes from a Software Architect, on a small island: March 2008

Monday, March 31, 2008

The Economics of Legacy Code

The final keynote for the SPA 2008 Conference was given by Michael Feathers from Object Mentor on the economics of managing legacy code.

Michael set the scene by mentioning some classic references including Freakonomics with regards behavioral economics, the Social Atom with regards how people behave in groups and the Big Ball of Mud which describes a common software architecture, the haphazard or casually structured system with a challenge 'do we believe it or do we ignore it?'.

He compared a Normal distribution curve with the Power Law curve stating that a power law appears in many aspects of life where preferences can be made. He then cited that if a distribution of method size across a project is plotted, it follows a power law and not a normal distribution. Intuitively you would expect uniformity with a clean design but suggested that this might be unnatural (maybe this is boutique design?). Personally, I think that unless the development process is actively managed, there will always be differences in the approach developers, primarily based on previous experience; this is a demonstration of the creative process of development.

There are some obvious reasons why organisations have difficulties in maintaining code:

Too many ways of Solving the Same Problem

As an example of showing that there are many ways of solving the same problem, Michael cited Perl where there is little consistency between two programs as there are multiple ways of doing the same thing. This clearly leads to difficulties in maintaining other people's work.

The Organisation

Another law, Conway's Law demonstrates that the software reflects the organisational structure that produced the software; it is often easy to see that different teams have worked on the project (see later for a personal view on this). Michael posed the question 'Can organisations align with the software to be produced?' and hence produce more maintainable code. Maybe....

Public bodies have a short-term view as they control expenditure
Organisations in the 'competitive milieu' have a number of options:

Software impacts the bottom line such that some changes may make the software better and hence benefit the organisation
Hyper-competitive organisations where software becomes a mess as the organisation strives
Monopoly organisations become lethargic asking 'why does it matter?'

Within a development team with dedicated people assigned, it is highly unlikely everyone will have an equal workload. Michael suggested that development should be subject to 'collective' ownership . This is one way of recognising that there are never enough good people to go round. This led to a discussion on professionalism which stirred up some interesting views.

My comments

I believe that code should be seen by multiple developers with continuous review. Pair programming achieves this with 2, but other practices can also lead to multiple views of the code with a reduced tendency to claim code as a personal asset rather than a team asset

Ensuring code is maintainable is not normally the primary goal of any developer; it is to get the task in hand done, hopefully on time. However, if there are good development practices defined at the start of the development which are managed consistently then there is a greater chance of achieving maintainable code. Given the increasing tendency to have off-shored development teams, then the managing the consistency of the development almost certainly mandates good tooling. As another session at SPA2008 showed, if this isn't done then the architecture and design will decay until it can no longer be maintained.

Saturday, March 29, 2008

Smelly Software Architectures

At the SPA 2008 Conference, Mark Dalgarno led an interesting workshop which explored the deterioration of software architecture over time. Mark outlined a number of different conditions which could indicate that the software architecture (assuming that there was one to start with!) is decaying from the as-intended architecture. There was an interesting debate about the importance of architecture and there was a clear distinction about architecture minimalism from those practicing agile techniques, to those engaged in large and complex systems of which software is only a small part of the overall solution.

I firmly believe that a key principle is that any architecture should be resilient to change and should be an enduring artifact. It should also be flexible and scalable, although this can depend on the expected lifetime of the architecture (architectures that are only expected to last for a year or two clearly don't need to exhibit the same attributes as an architecture which is intended to last for 10 years or more). This also requires that the architecture and design are seen as distinct activities - I have seen many examples where the architecture has been assumed and the detailed design and implementation has proceeded without any thought about the architectural options or future maintenance requirements.

How an architecture can be assessed was an interesting discussion with metrics being used (e.g. dependency counts) or consideration of some potential change scenarios although the more agile developments considered this to be unnecessary. Example scenarios were offered such as a change in the operating system version or a database upgrade and the resilience of the architecture to this change. However, it is clearly unachievable to think of all such change scenarios that could occur. Some examples of changes that weren't originally anticipated but were then implemented with some significant pain included the addition of error messages in multiple languages. The impact of a project driven by time pressures (time driven coding) were also likely to lead to a less enduring solution and a less robust architecture.

The economics of architectural decay was considered with a view that 'rewriting is considered harmful' particularly for solutions which are deployed widely where there is no acceptable alternative to maintaining what is currently implemented. While I sympathise with this as a view, any maintenance regime should always consider UUD (upgrade, update and disposal) of an in-service solution and to recognise that there may be some time when multiple versions of the same solution operating on potentially different architectures may have to be maintained in parallel.

While the session didn't offer any solutions to a difficult problem, it did pose some interesting thoughts which will require some further consideration for future software architectures.

Friday, March 28, 2008

Hello World in another language

The SPA 2008 Conference provided me with an opportunity to see the functional language Erlang for the first time. This isn't a new language (see panel session) having been in use for over 20 years in Ericson for applications such as PABX systems.

As with all programming languages, the Erlang version of 'Hello World!' was demonstrated but the more interesting examples were those which highlighted some of the features of Erlang that make it standout from the other languages. It is clearly particularly well suited to massively concurrent applications as it supports 'cheap' thread creation with data being shared by messages. A simple demonstration in calculating Fibonanci numbers showed. It also showed how crucial the implementation of a multi-threaded, parallel application is. Ensuring that the implementation was tail recursive created an application that returned a result in a few seconds; not doing this resulted in the result taking much longer.

With the move to multi-core architectures, efficient use of the cores through parallel applications becomes much more important. In the classic article, The Free Lunch is Over, Herb Sutter outlined that concurrency is the biggest change to impact software development since the adoption of OO. There is a fundamental change required in how programs are written, impacting good principles such as modularity, to ensure that the applications can take full advantage of the available processing resource. Erlang's approach has been to support parallel processing from the start and to make concurrency easy through the use of the Actor Model (contrast this with the approach taken by Haskell which uses Software Transactional Memory). Erlang's approach is extremely efficient and lightweight, and performs asynchronous processing; there is no guarantee on the ordering of messages.

Erlang clearly has a very different approach to any concurrent programming I have done in other languages such as C, C++, Ada, Java etc. In the short session I couldn't see many other benefits other than this but maybe this is just encouraging me to go away and experiment. I would be particularly interested in seeing how Erlang could be used in a mixed-language application, say Erlang , C++ and Java - I have already found an example which discussed the use of Erlang and Java.

Erlang also got me thinking. Many years ago, shortly after I graduated from university, one of my friends enrolled for a PhD in which he was to develop mathematical algorithms (specifically ordinary differential equations) to run efficiently in a multi-processor environment. His language was FORTRAN (there were other languages around at the time but FORTRAN was the dominant scientific language at the time) and he managed to get a number of the algorithms operating efficiently in multi-processor configurations. I wonder if Erlang had been around at the time, if it could have been used and would have made his job any easier?

Thursday, March 27, 2008

Metrics that are useful

As an experiment, I ran one of the BoF sessions at the SPA 2008 Conference on the topic of 'Metrics which are useful'. An interesting discussion ensued in the group consisting of academics, software practitioners and quality specialists.

The following is a summary of my notes:

Why Capture Metrics and what are you trying to achieve?

Purpose of metric capture depends on customer and business

Provide 'Bird's eye view of projects'

Used to improve quality

Used to provide evidence to support quality measure e.g. CMMI

Some thoughts on metrics (good and bad)

SLOC (source lines of code). Easy to calculate once agreement has been made with regards 'a line of code' but considered not good measure. Not particularly appropriate when system includes COTS components as part of solution. SLOC can change depending on language choice. No incentive to encourage reuse (see later), abstractions etc and can lead to excessive cut n'paste.

Coupling/Cohesion of interfaces as a mechanism for showing well-designed modules which can be reused

Use Metrics to monitor 'right first time' during integration. Key issue is how and what to measure. Can also be used as a measure of achievement by Project Manager
Monitoring reuse but difficult to measure or demonstrate. Potential measure could be number of hours saved. Reuse is dependent on expertise, team, business processes and functionality.

Measuring quality of code by examining use (or not!) of framework primitives and higher levels of abstractions.

Number of dependencies (Java) – the more a class is used, the greater the chance that bugs could have been found. Also consider number of interfaces used by component.

Number of tests passed. Not a good measure as it says nothing about requirements achievement. Better measure would be number of requirements passed – each test would have to reference the requirement(s) which are being tested (in part).

Measurement source code changes between different phases e.g. Unit test and Functional Test
Code coverage and number of tests (and completed) are not particularly useful. Code coverage for TDD is always 100%.

Measuring capabilities delivered can be useful metric particularly when adapted to meet business needs.

Key Performance Indicators (KPIs) often used for measures of performance across diverse set of projects (i.e. Not just software)

Using Metrics

What to do with the data once calculated/presented. Metrics must be presented in an easy to understood form (examples include traffic light reports with appropriate thresholds for each colour, graphs (are these always clear?)).

How frequently should the data be reviewed and corrective action instigated? Time should probably be a function of the size of the project/development and the anticipated development time. Small projects may be appropriate to measure daily (probably as part of an overnight build). Other projects it may be more appropriate to record on a weekly or monthly period depending on the likely changes between each report.

The cost of measuring/calculating the metrics should be negligible.

Metrics are always a snapshot. Need to examine the trends/dynamics rather than the absolute values and take appropriate action if trend is going in wrong direction. Doesn't matter if you have 1000 bugs this week, what matters next week is that you have less than 1000 bugs!. Any thresholds need to be appropriate to the project and reviewed and revised continuously. New developments may be able to start with a threshold of no compilation warnings when all code is new; this threshold might not be appropriate for a legacy system.

Some Recommendations

Measure the business value and not the code. Measuring something that is significant to the business is more important. Examples include how many times has the software been used?

Measurement must be understood by EVERYONE (and described in appropriate language), easy to calculate (i.e. Not subjective) and explain what it means to the business. The types of metrics to be selected depends on the type of organisation (and the business structure) and frequently change!

Metrics should be used to encourage good practice (e.g. Reuse, abstractions, frameworks) and not to use to punish offenders!

Starting a project from scratch is ideal to set good practices. However, regardless of where metrics are introduced, the key practice is to monitor the dynamic nature of the project.

What's the point of Software Architecture?

At the SPA 2008 conference, I attended a session entitled 'Does Architecture Help?' in which Eric Nelson from Microsoft identified 9 observations on architecture that he has encountered from his work with Independent Software vendors (ISVs). He noted that there was no correlation to project success if there isn't an architecture and that there are many different ways (read architecture) of solving the same problem. Although there isn't a perfect architecture, there are clearly examples of bad architectures, which are often demonstrated by symptons such as change being very expensive, that have delivered a great service possibly through the massive advances in hardware technology being 'the get out of jail card'.

The rise of agile developments has clearly impacted the importance of architecture with perhaps the removal of the formality of this activity. Clearly the availability of technology can have significant impacts on software architecture. While there is always the tempatition to try and accomodate the latest technology in your next project, I think this is a risky strategy as architectures normally need to be based on firm foundations (there are always exceptions). With a good architecture (not a perfect one!), it is possible to keep technology choice separate from the architecture. In my experience of Model Driven Architectures (MDA), it is possible to separate the technology from the architecture through the use of separare models:

Platform Independent Models (PIM) which is independent of a the technical choice

Platform Specific Model (PSM) in which the technical choice is made

The slides for the session are available from Eric's blog.

I discovered something when I downloaded the slides, a new file extension .PPTX. My computer didn't recognise the extension (I only had Microsoft Powerpoint Viewer 2003 installed). A hunt around Microsoft's website revealed a free download for Microsoft Powerpoint Viewer 2007 which adds the capabilty to view .PPTX files, which I discovered are Powerpoint files stored in Microsoft's new XML file format.

Wednesday, March 26, 2008

Are we nearly there?

The first panel at the SPA2008 Conference addressed the interesting topic 'Is Software Practice Advancing' with particular reference to the last 15 years. In summary, the panel said 'Yes, just!' after considering programming languages, software design and project management.

There hasn't been much advances in programming languages over the last 15 years (new languages such as C#, Javascript were cited) although some languages have become more popular (e.g. Haskell, Ruby) and others are now only used in maintenance tasks rather than new developments. Tooling for languages has improved with such toolsets as Eclipse, Visual Studio and IntelliJ aligned with better compilers although it is debatable if this has produced any significant change (reduction?) in development costs. There is also an increasing number of open source tools which are free which is a noticeable change for development. Programming is still unpredicatble and the risk of things going wrong hasn't changed. While languages haven't changed, there has been some significant improvements in libraries and frameworks which lead to some gains in the development life-cycle.

COMMENT: I would say that there are an increasing number of languages which are now available and supported due to the convergence on to two development platforms (Unix (and all flavours of) , and Microsoft) as the use of proprietary systems has diminished. However, the days of the coder maybe numbered as an increasing number of tools can now auto-generate significant amounts of code.

Software Design also hasn't changed significantly in the last 15 years although the debates about OO notations being finally resolved with the release of UML. Maybe what has changed is the way software is now assembled as there is now acceptance that using open-source components and 'glueing them together' is more than adequate for many applications and demonstrates good software re-use. More complex (and better?) systems are now being produced although the design skill level hasn't changed; it may actually have decreased as design becomes unfashionable as the need to 'write code and get something working' results in formal design processes being side-stepped. The production of software architecture has improved and is recognised as being of value to all developments.

COMMENT: The rise and maturity of the open-source is now having an increasing influence on software developments - the development and adoption of standards has probably helped this. The rise of the web has probably contriubuted to the loss of design fomality. There is still no substitute for expereince when designing systems. The use of tools to support the design process, particularly for large systems which are adopting model-driven developments, could lead to some changes in the perception of design and result in changes in effort profiles for software developments as the emphaisis changes to design and integration rather than coding.

The final section of interest covered project management where it was remarked that some techniques have improved, the practice is still poor. Projects have got bigger and bigger and the probability of failure has only marginally decreased over the last 15 years (still >70% chance of failure). The widespread adoption of agile and iterative life cycles over the traditional waterfall life cycle has not been as dramatic as the noise made by the agile community would have us believe.

COMMENT There is no precise definition of what Project Management is as it can cover a vast array of activities depending on the size of the development. Some approaches have been made e.g. OGC, PRINCE, DSDM to try and formalise the activities but these don't adequately address a badly specified and procured system which are often root causes of many project failures.

Tuesday, March 25, 2008

How DSM can improve productivity

At the SPA2008 conference, I attended a session on Domain Specific Modelling given by Juha-Pekka Tolvannen from Metacase. I increasingly believe that the use of full code generation can lead to significant improvements in productivity, and if appropriately managed, code quality. I was therefore interested in Juha-Pekka's assertion that productivity increase is achieved not by using particular high-level languages such as C#, Java or Python or design approaches such as UML, but by increasing the level of abstraction. There have been some attempts through the use of frameworks, patterns and libraries which can help the abstraction level provided that the artefacts are used appropriately.

However Domain Specific Modelling (DSM), which is creating a language for specific purposes with automatic code generation, can lead to bigger gains in productivity when compared with normal development practices (claims of up to 30 times the productivity) with the code generation resulting in 50% less bugs than manually written code. The productivity gains need to be considered with the cost of the creation of the specific DSM and the number of times the DSM is to be used.

The cost of the DSM development (typically one or two developers together with a number of domain experts) and the number of applications over time can then determine whether a DSM will deliver the required productivity gains.

The examples which were cited clearly demonstrated the power that DSM has in producing production quality code quickly and efficiently. It is clear that for some large development projects, there would need to be many different Domain specific models created to support the development, and that it is highly unlikely that a whole development could be completed entirely using DSM. Clearly some further research is required to determine the categories of applications for which DSM should be considered - off to visit the DSM Forum.

Monday, March 24, 2008

Learning another language beginning with P

I arrived early at the SPA2008 conference so that I could attend a fast-paced (not my words!) workshop given by Nick Efford on Python and how it could be used to develop web applications. Having experience in using the other two P languages for the web (Perl and PHP), I was keen to see what made Python a better choice than the other two.

Nick started by covering the fundamentals of the language, explaining some of specific features of the language that were different to other languages including dynamic typing and the importance of layout as a way of implying structure (e.g. no } brackets for if statements, just ensure that the if body is indented). Clearly one of the strengths of Python is the ability to efficiently handle and operate on sets of data as tuples or lists.

After a explanation of OO-Python, a demonstration of the neat way that documentation can be created for classes and methods (embedding a method/class description in """) and a quick overview of the Standard Python libraries provided; Nick moved on to the various implementations of Python that are available including Jython (a Java version which currently lags behind the 'native' Python), IronPython (for .NET), CherryPy and Django.

Nick is obviously a fan of Django and demonstrated very quickly how a web-based application could be created quickly using the Django toolkit although I couldn't get it working on my Vista Laptop. One neat feature of Django that particularly impressed me, was the Admin interface which is automatically generated when you create a project. I also like the approach to documenting Django through the publication of both a paper and web-based book. Clearly I need a project to try and experience the power and efficiency of developing web-based applications using Django for real.

In summary, I felt that Nick had done a very good job at highlighting the advantages of Python and left me keen to try and experiment further, particularly with Django. At the end of the session, I left knowing a lot more about Python than before and I feel I will soon be able to add another language beginning with P to my kitbag knowing that there will be some applications where Python will make an excellent implementation language. It is also clear that within the SPA community, Python is a must-know language as there where many sessions in the conference were Python was cited as the language of choice to implement certain tasks.

Sunday, March 23, 2008

SPA 2008

I have just returned from attending the BCS Software Practice Advancement annual conference (SPA2008). It was my first visit to this conference and found it a very interesting and stimulating few days. As with many conference programmes, there is always too much to try and attend, so I tried to stick with the sessions about architecture or languages I knew nothing about (in this case there were sessions on Python and Erlang). I even got the chance to try my hand at running a BoF session on Metrics which were useful , which resulted in an interesting discussion on a number of metrics which weren't very useful! After each session, there is always valuable discussions with fellow attendees; in many cases there was further opportunities to explore the topic further or to perform some further personal research.

Over the next few days, I will post my summaries of the various sessions and keynotes that I attended together with some further thoughts.