Notes from a Software Architect, on a small island

Sunday, January 1, 2012

TDD for embedded development

I recently attended an interesting talk on test driven development (TDD)
for embedded development given by Zulke UK for the BCS SPA specialist group.

Robotshop Rover

The talk was the result of some experiments using an Arduino board, a bluetooth interface and an android tablet. The embedded platform was a Robotshop Rover with a number of sensors and controllers. The sensors were used to guide the robot along a track indicated by a solid black line; the motors were used to control the direction of the robot and also its speed.

Although the standard arduino development environment isn't a full IDE and is limited in its functionality, does come with some good code
examples. An alternative environment is the Eclipse CDT with the AVR plugin added to handle the download of the arduino image to the target platform. To support the development using TDD, CPPUTEST was used as the test framework. CPPUTEST is recommended as a framework suitable for embedded development (see James Greening book on TDD for Embedded Development) and appeared to the presenters to be more effective than CPPUNIT. It was noted by members of the audience that there are few tools which have good integration with the continuous integration platforms such as Jenkins.

An overview of TDD was given, and its application for an embedded target environment

1/ TDD needs to be able to test on both the development and target environments. This requires that two projects are created, one with arduino as the target environment and an other targeted at an x86 environment

2/ The cycle of test->code->refactor needs to be followed with the tests being chosen from a backlog. Code shouldn't be written unless it is to satisfy an existing test.

3/ The cycle should be choose test->write test->run-test->fail! If it doesn't fail (and it could be due to compiler or link failure), then it normally indicates that the test has been incorrectly written.

4/ Limited design is required although normal good software engineering practice should still be adopted (no monolithic functions etc).

5/ Mock interfaces should be used to unit test sensor interfaces. This allows the logic to be tested and debugged first before loading on to the target.

6/ If a state machine is required, some design is essential before any test cases can be identified and written.

The exercise in TDD with the arduino target resulted in no logic errors once the code was exercised on the target. However, the behaviour of the robot, in particular the speed of the motors required some further development and enhancement to the codebase. Given the fact that arduino boards are typically focused on the school market, I was surprised that C++ was the chosen development language. However, the C++ used on the arduino is a cut down version which removes many of the complexities of the full C++ language.

Whilst the associated Android development (on a Motorola XOOM tablet) appeared to be successful in terms of developing a simple user interface to send commends via Bluetooth to control the robot (e.g. stop, start,...), the development of the application revealed some shortcomings. Although the Android development kit works very well with Eclipse (the code is a mixture of Java and xml) and allows on target debugging, the use of TDD is less appropriate given the extensive use of callbacks (e.g.onclick, ....). The testing of GUI applications cannot be adequately tested within a development environment; fortunately the android debugger is excellent at pinpointing issues (typically null pointer exceptions). Android emulators can help to a limited extent but are not a sufficient replacement to an actual device. Android development for tablets is just evolving as the platform moves from a phone, with relatively simple applications, to potentially far more complex applications. The launch of Android 4.0 development kit (aka Ice-Cream sandwich) will clearly accelerate the development of more complex applications which will necessitate the employment of sound software engineering principles in delivering quality products.

In summary, the session demonstrated successfully that TDD could be applied in an embedded environment and that through the use of appropriate open source tools, software development for the expanding android market can follow tried and tested techniques.

Thursday, November 17, 2011

Open Source Software - the legal implications

I attended the recent BCS Manchester/Society for Computers and Law event on the use of Open Source Software and the legal implications. It was given by Dai Davis, an information technology lawyer who is also a chartered engineer, a very unusual combination but he clearly knew his material.

Dai started by explaining the basic aspects of copyright law and the implications that this

had on software. It was clear that some of the original purposes of copyright law (to prevent

copying) were applicable to software but that the period that copyright lasts (70 years after

the authors death) clearly made little sense for software. However a number of points caught my eye:

Copyright protects the manifestation and not the subject matter. This means that look and feel is not normally subject to copyright although fonts are.
Copyright infringement also includes translating the material. Translation in the software case includes compiling source code as well as rewriting the source code into another language.
Copyright protects copying some or all of the material. The amount does not normally matter.
Moral rights do not extend to software but do apply to documentation
Copyright infringement is both a civil and criminal offence with a maximum of 10 years imprisonment and an unlimited fine.

Dai then explained that the first owner (or creator) of the material owns the copyright. Misunderstanding this is the major cause of disputes about copyright. Clearly there are exceptions if the material is created in the course of employment (copyright rests with the employer) or if the contract under which the material is being created 'assigns' the copyright to the purchaser.

All software licences grants the purchaser permission to use the software otherwise the purchaser would be in breach of the copyright. Licences can be restrictive e.g. by time, number of concurrent users and all licences are transferable according to EU law.

Copyright of Open Source Software is no different to normal copyright of software but the approach to licencing is very different:

Nearly all OSS do not require payment to acquire
Free relates to restrictions on use (non-OSS can place restrictions)
Open access to source and usage is required (not normally available with non-OSS)

However, the licences are very difficult to enforce mainly because there has been no loss in terms of monetary value. There has never been a successful prosecution in the UK although there are a number of examples in Germany (where litigation is cheaper than the UK) and an example in the US (Jacobson v Katzer in 2009) where a 'token' settlement of $100000 was awarded.

Whilst there may be little prospect of getting sued for use of Open Source Software the biggest issue often comes when businesses are sold and OSS is found within a product - this often affects the eventual purchase price of the company. Many businesses don't know where Open Source Software is being used and included within its own products because it is very difficult to police and manage.

A video of the session was filmed by students from Manchester Metropolitan University the resulting video being made available via the BCS Manchester website.

Wednesday, July 27, 2011

How good are your requirements?

I read an interesting post which was trying to determine how much detail a requirement should contain. As with any question like this, it all depends on a number of factors and it is not possible to give a rule which can be religiously followed. Experience over time will determine what works for you - there is always a cost if the balance isn't right. Too few requirements and there is likely to be a very difficult verification phase; too many requirements and it will prolong the development and testing phases.

The first rule of requirements is to ask your customer 'why?'. If they can't explain simply why the requirement is needed then it isn't essential and it should be discounted. If the response is acceptable, the next question is 'How am I going to validate that I have satisfied the requirement'? If you can't agree on how the requirement is to validated, then the requirement clearly needs clarifying, maybe with some constraints explicitly included in the requirement wording. The language of requirements is very important and the differences between shall, should, will, may are VERY important. Any requirement with phrases such as 'such as', 'for example' should always be rejected as they are open-ended and can never be fully satisfied.

As an example, I encountered a requirement recently which included the phrase 'any printer shall be supported'. After discussions, and any customer that doesn't engage in discussions is clearly an indication of a very difficult customer!, it became very clear why the requirement had be written in the way - a third party would be supplying the as yet unspecified printer. However, the discussion did enable the requirement wording to became slightly more achievable by rephrasing as 'a network connected printer supporting postscript'. At least I would have a chance of testing this (and have constrained it to exclude printers with parallel or usb interfaces).

The hardest part of requirements is not the accepting the requirement, it is the validation at the end. How often have you encountered 'This isn't what I wanted'. The only way to avoid this is to remain in constant contact with your customer to validate any assumptions throughout the development so that there aren't any surprises at the end.

Other than experience, I have yet to find a reliable and objective way of assessing the 'goodness' of a requirement set. It would clearly be possible to perform some analysis of the language used, looking for ambiguous phrases for example, but would this be a sufficient measure?

Saturday, June 4, 2011

Computer Forensics

BCS Manchester recently hosted an interesting evening on the growing importance of computer forensics. The session was led by Sam Raincock, an experienced expert in performing forensic analyses. Whilst the session did not reveal the secrets behind a successful analysis (or give hints of how to make an analysis more difficult), it did explore some of the approaches (in general) that can be used in establishing evidence. Whilst a typical forensic investigation does include a significant amount of technical information this only accounts for about 20% of an analysis as the remaining time is concerned with communication with lawyers and report writing. As in all legal cases, it is crucial to review and present all the evidence to piece together a coherent case rather than circumstantancial evidence.

While Computer forensics is primarily the examination of any device that permanently stores data (the range of devices is ever-expanding from the traditional hard disc drives, CD-ROMS and USB memory sticks to mobile phones and cameras), it also includes reviewing system functionality in its goal to try to establish what happened and who did it. It is used in a variety of cases include criminal, civil and fraud cases.

It was stated that 'Every contact leaves a trace' by Edmond Locard, an early pioneer of forensic science. This is very true with all computer usage as every time a file is created, every web page that is browsed, every document that is printed is recorded somewhere although computer usage is unique to everyone.

Some key points that I took away from the session included:

Never assume anything
All humans are unpredictable, and different
Personnel cause more damage than they discover
Do not assume that common sense prevails
The IT department are not forensically trained and don't necessarily understand the value of every piece of data
Forensics is not about data recovery
Ownership of data must be established

A forensic examination is looking at where the offence was allegedly committed, how the event occurred and who performedthe activity. A typical examination can normally be performed on a single device (once a forensic image has been taken) by an appropriate expert and does not normally need to consult with outside agencies (e.g. internet service providers) to obtain specific information. The examination will review such data as cookies, the various system logs, network connections (IP addresses, type of connection particularly whether it was local, remote, fixed, wireless etc). The usage patterns of a computer will reveal a significant amount as every human has particular behaviour traits. The use of the various system logs that reside on a computer or within a network can reveal significant and valuable data; these logs should be actively monitored as they can often be the first sign that something unusual is being performed that may merit investigation. The sooner something is detected, the greater the chance of limiting the damage (or increasing the evidence in order to establisha conviction). In the case of an incident being detected within a business, the primary aim is to return a business to normal as quickly as possible. This is where policies are vitally important; it is equally important that they are actively used, policed and maintained.

Whilst there is no formal qualification required to become a forensic expert (an inquiring mind would probably be useful), it is clearly a growing and important aspect of computing. There are clearly many challenges with the continually evolving usage of computers; the growing importance of the cloud will clearly require different techniques to those employed when examining a physical item such as a laptop. The session left me wondering what traits my computer usage would reveal about me but also wanting to find out more about what is being recorded without me having any knowledge.

Monday, October 11, 2010

Open Source Document and Content Management

BCS Manchester recently hosted a meeting on open source content management and document management in the Public Sector. The speaker. Graham Oakes, explained that the catalyst for this had been an article in the Guardian about the use of commercial software by Birmingham City Council and its attempt at building websites. They had already spent £2.8M and the question was asked ‘Why not use Open Source Software instead?’ Graham stated that there is an awful lot of mis-information around, in particular that the use of OSS does not mean it costs nothing! This led the BCS Open Source Software Group to run a conference in January 2010 looking at the use and adoption of OSS in the public sector.

After briefly outlining what OSS is (source code owned by the community which can derive new works), Graham outlined a typical software stack found in many organisations. At every level, there are good open source solutions available. Content Management normally fits in at the application layer below portals and webservers. Content Management has a number of very strong options including Plone, Hippo, TYPO3, Drupal, Joomla, eZ and Umbraco. Many of these have already been adopted by public sector in developing websites, for example a number of Police forces.Document Management is not as well developed as Content Management and there are fewer options.

Gartner and Forrester enterprise software reviews both report that OSS should be adopted and it is becoming more amenable to use in the UK but it is still necessary to consider the full life-cycle costs. OSS should be considered equivalent to proprietary – public sector should now consider and event contribute to OSS projects. However UK is some way behind other european nations (specifically Netherlands, Germany, Italy and Denmark) with the OSOSS project in Netherlands urging public administration to use more OSS and open standards.

Advantages

The key advantages of adopting OSS within the public sector were identified as

The reason for adoption is low up-front costs. Low costs of initial ownership, but the organisation needs to consider normal software selection processes and consider the risks, requirements for the software etc. OSS should be considered no differently to commercial software. It is still necessary to look at the total cost of ownership (TCO) and an organisation may need to still involve a system integrator in order to deploy effectively.
OSS applications do not constrain the design. The public sector can can use it to start small, think big.
Many OSS is easier to work with because access to source code (only useful if you have skills to use it!) is always available. This can provide an additional option to documentation. OSS is also increasingly important with the cloud as proprietary license models don’t adapt readily.
Good to help the public sector to demonstrate openness (committed to visibility and open to the public)

Apart from the last one, the advantages are not particularly specific for public sector adoption.

Risks

Of course, there is always a downside, commonly called risks in procurement circles. The risks identified included

Getting over the perception that everything is free. This misunderstands the true costs of OSS. Most costs in using software are not in the licenses – it is actually in the content created/managed by the software. Most project costs are less than 10% on technology/license and the migration costs must always be considered as these can often be many more times more expensive than the base software costs.
Misperception around content management and reusability of OSS. Just because some OSS can be used securely, does not mean that all OSS is secure.
Public sector are use to working with large organisations. OSS needs a different approach which the public sector may not readily adopt due to the mismatch of scale. OSS developers often move at a far faster pace than the public sector does (or can). This can be difficult at the procurement stage as the existing procurement model (level playing field) is broken. The procurement approach needs to recognise that OSS is no more less secure than proprietary solutions
Unreasoned decisions still dictate the major procurement decisions. OSS might not be a perfect match to the requirements but may provide a suitable solution.

Conclusions

Graham presented a set of conclusions, which if I am honest are no different to proprietary software.

Remember not all OSS are not the same, different quality levels and capabilities
Always chose carefully – consider usages scenarios
Look beyond licensing costs, user adoption, change management, migration
A team still creates the success over the technology. Choose the right team!
OSS supports evolutionary delivery, try before buy, which encourages innovation and supports agile and lean practices. However, this is good practice for all software.
License fees for software bring costs forward and commit project for the duration (unless trial licenses are available). OSS does not have this commitment and it is (relatively) easy to change OSS software without excessive upfront costs.

So would OSS have solved Birmingham’s problem. No. The problem was not a cost of licences issue; it was not understanding the issue well enough. OSS would have helped to examine the problem in the small before the initial financial commitment was identified which might have produced a more realistic budget.

Tuesday, July 20, 2010

Lessons in measurement and data analysis

Recently I attended a very interesting and entertaining lecture by Peter Comer, from Abellio Solutions, to the BCS Quality Management Specialist Interest Group (NW) on lessons learnt in measurement and data analysis following a recent quality audit of an organisation’s quality management system (QMS).

The talk started by highlighting the requirements for measurement in ISO9001 (section 8). Key aspects that were highlighted included

Measure process within QMS to show conformity with and effectiveness of QMS
Monitoring and measurement of processes, products and customer satisfaction with QMS
Handle and control defects with products and services
Analyse data to determine suitability and effectiveness of QMS
Continual improvements through corrective and preventative actions

It was noted that everyone has a KPI (Key Performance Indicator) to measure the effectiveness of products and services although every organisation will use the KPIs slightly differently.

Peter outlined the context of the audit, which was an internal audit in preparation for a forthcoming external audit. The audit was for a medium sized organisation with small software group working in transport domain. A number of minor non-conformances which were relatively straightforward to correct. However, after the audit an interesting discussion ensued regarding the software development process which stated that they were finding more bugs in bespoke software development than anticipated and a lot harder to fix. Initial suggestions included:

Look at risk management practices. However, the organisation had already done this by reviewing a old (2002) downloaded paper looking at risk characteristics.
Look at alternative approaches to software development.

It was the approach to risk which intrigued Peter. The quality of the paper was immediately considered. What was the quality of the paper? Has it been peer-reviewed? Is it still current and relevant?

Peter then critiqued the paper. The paper proposed a number of characteristics supplemented by designators; it was quickly observed that there was considerable overlap between the designators. The analysis of the data was across a number of different sources although no indication of what the counting rules are (and no indication if they were rigorous and consistent). The designators were not representative of all risk factors that may affect a development and said nothing about their relevance to the size of development. The characteristics focused on cultural issues rather than technical issues - risk characteristics should cover both. Just counting risk occurrences does not demonstrate the impact that the risk could have on the project.

Turning to the conclusions, Peter considered if the conclusions were valid. What would happen if you analysed the data in a different way, would the conclusions be different? Can we be assured that the data has been analysed by someone with prior experience in software development? It was observed that designators were shaped to criteria which is appropriate, but one size doesn’t fit all. Only by analysing the data in a number of different ways can the significance of the data can be established. It can also show if the data is not balanced which can in turn lead to skewed results. In the paper under review, it was clear that qualitative data was being used quantatively.

Peter concluded by stating that by ignoring simple objective measures can lead to the wrong corrective approach which might not be appropriate to their process and product. This is because ‘you don’t know what you don’t know’. It is essential to formally define what to count (this is a metric) with an aim to make the data to be objective. Whatever the method for collection, it must be stated to ensure that it is consistent.

The talk was very informative and left much food for thought. I have always aimed to try and automate the collection process to try and make this consistent. However this does nothing if the data is interpreted incorrectly or inconsistently. It is also difficult to know if you are collecting the right data but that is what experience is for!

Wednesday, March 17, 2010

Creating Intelligent Machines

I have just attended the excellent IET/BCS 2010 Turing Lecture 'Embracing Uncertainty: The New Machine Intelligence' at the University of Manchester which was given this year by Professor Chris Bishop who is the Chief Research Scientist at Microsoft Research in Cambridge and also Chair of Computer Science at the University of Edinburgh. The lecture allowed Chris to share his undoubted passion for machine learning, and although there were a number of mathematical aspects mentioned during the talk, Chris managed to ensure everyone was able to understand the key concepts being described.

Chris started by explaining that his interest is in building a framework for building intelligence into computers, something which has been a goal for many researchers for many years. This is now becoming increasingly important due to the vast amounts of data which is now available for analysis. With the amount of data doubling every 18 months, there is an increasing need to move away from purely algorithmic ways of reviewing the data to solutions which are based on learning from the data. This has traditionally been the goal for machine (or artificial) intelligence and despite what Marvin Minsky wrote in 1967 in 'Computation: Finite and Infinite Machines' that "within a generation ... the problem of creating 'artificial intelligence' will substantially be solved", the problem still does not have a satisfactory solution for many classes of problem.

A quick summary of the history of artificial intelligence showed that expert systems, which were good at certain applications but required significant investment in capturing and defining the rules, and neural networks which provide a statistical learning approach but have difficulty in capturing the necessary domain knowledge within the model, were not adequate for today's class of problems. An alternative approach which was able to integrate domain knowledge with statistical learning was required and Chris's approach was to use a combination of approaches:

Bayesian Learning which uses probability distributions to quantify the uncertainty of the data. The distributions are amended once 'real data' is applied to the model which results in a reduction in the uncertainty.
Probabilistic Graphical Models which enables domain knowledge to be captured in directed graphs with each node having a probability distribution.
Efficient inference which ensures efficiency in computation

To explain the approach, Chris sensibly used real-life case studies to demonstrate the application of the theory in three very diverse applications.

His first example was of Bayesian Ranking system to be used in producing a global ranking from noisy partial rankings. The conventional approaches is to use the Elo rating system which is a method for calculating the relative skill levels of players in two-player games. The Elo system could not handle team games or more than 2 players. As part of the launch of the Xbox 360 Live online playing solution, Microsoft developed the TrueSkill algorithm to match opponents of similar skill levels. The TrueSkill algorithm converges far faster than Elo by managing the uncertainty in a more efficient way; it also operates quickly so that users can find suitable opponents in a few seconds out of a user population of many million. Further details on TrueSkill(TM) are available at http://research.microsoft.com/en-us/projects/trueskill/

The next example was for a website serving adverts and how to determine which advert to show based on the probability of being clicked and the value of click. The proposed approach was to use gausian probability in order to assign a weight to a number of features which is used to determine the ranking. However it is important to ensure that the system continually learns in order to re-evaluate the ranking to ensure that the solution accurately reflects the dynamics of the adverts. If this was not the case, it would be very difficult for a new advert to be be served.

The final example was the Manchester Asthma and Allergy Study which is working with a comprehensive data set acquired over 11 years. The data set is continually being augmented with new types of data (recently genetic data has been added) and the study has been successful at establishing the important variables and features and their relationships. By defining a highly structured model of the domain knowledge, it has been possible to assign each variable a probability distribution. By placing the data at the heart of the study and applying some machine learning techniques, a number of key observations are now being reported which might not have been apparent if more traditional statistical techniques had been used.

As a closing remark, Chris promoted a product from Microsoft Research (Infer.net) which provides a framework for further experimentation in developing Bayesian models for a variety of machine learning problems.

As is now traditional with the Turing Lecture, it is presented at several locations around the country. A webcast of the version presented at the IET in London is available on the IET TV channel.