Notes from a Software Architect, on a small island: Manchester University

Showing posts with label Manchester University. Show all posts

Wednesday, November 27, 2013

Programming a million core machine

I have just attended an excellent talk by Steve Furber, Professor of Computer Engineering at University of Manchester on the challenges on programming a million core machine as part of the SpiNNaker project.

The SpiNNaker project has been in existence for around 15 years and has been attempting to answer two fundamental questions:

How does the brain do what it does? Can massively parallel computing accelerate our understanding of the brain?
How can our (increasing) understanding of the brain help us create more efficient, parallel and fault-tolerant computation?

The comparison of a parallel computer with a brain is not accidental since brains share many of the required attributes being massively parallel, have lots of interconnections, provide excellent power-efficiency, require low speed communications, is adaptable/fault-tolerant (of failure) and capable of learning autonomously. The challenges for computing as Moore's law progresses is that there will eventually come a time when further increases in speed will not be possible and as processing speed has increased, energy efficiency has become an increasingly important characteristic to address. The future is therefore parallel but the approach to handling this is far from clear. The SpiNNaker project has been established to attempt to model a brain (around 1% of a human brain) using approximately 1 million mobile phone chips with efficient asynchronous interconnections whilst also examining the approach to developing efficient parallel applications.

The project is built on 3 core principles:

The topology is virtualised and is as generic as possible. The physical and logical connectivity are decoupled.
There is no global synchronisation between the processing elements.
Energy frugality such that that cost of a processor is zero (removing the need for load balancing) and the energy usage of each processor is minimised.

[As an aside, energy efficient computing is a growing interest such that when a program is constructed, how much energy is required to complete the computation is now the key factor in many systems (in terms of operational cost)]

The SpiNNaker project has designed a node which contains two chips; one chip is used for processing and consists of 18 ARM processors (1 hosts the operating system, 16 are used for application execution and 1 is spare) and the other chip is for memory (SDRAM). The nodes are connected in a 2D-mesh due to simplicity and cost. 48 nodes are assembled onto a PCB such that 864 processors are available per board. The processor only supports integer computation. The major innovation in the design is the interconnectivity within a node and between nodes on a board, A simple packet switched network is used to send very small packets around; each node has a router which is used to efficiently send the packets either within the node or to a neighbouring node. Ultimately, 24 PCBs are housed within a single 19” rack which are then housed (5) within a cabinet such that each cabinet has 120 PCBs which equates to 5760 nodes or 103680 processors. 10 cabinets would therefore result in over 1 million processors and would require around 10KW. A host machine (running Linux) is connected via Ethernet to the cabinet (and optionally each board).

Networking (and is efficiency) is the key challenge to emulate neurons. The approach by Spinnaker is to capture a simple spike (representing a neuron communication) within a small packet (40 bits) and then multicast this data around (each neuron is allocated a unique identifier, there is a theoretical limit of 4 billion neurons which can be modelled). By the use of a 3-stage associative memory holding some simple routing information, the destination of each event can be determined. If the table does not contain an entry, the packet is simply passed through to the next router. This approach is ideally suited to a static network or a (very) slowly changing network. It struck me that this simple approach could be very useful in efficient communication across the internet and maybe useful for meeting the challenge of the 'Internet of Things'.

Developing applications for SpiNNaker requires that the problem is split into two parts; one part handles the connectivity graph between nodes; the other part handles the conventional computing cycle with compile/link and deploy. Whilst the performance in terms of throughput is impressive (250 Gbps for 1024 links), it is the throughput which is exceptional at over 10 Billion packets/second.

The programming approach is to use an event-driven programming paradigm which discourages single-threaded execution. Each node runs a single application with the applications (written in C) communicating via an API to SARK (the SpiNNaker Application Runtime Kernel) which is hosted on the processor. The event model effectively maps to interrupt handlers on the processor with 3 key events handled by each application:

A new packet (highest priority)
A (DMA) memory transfer
A timer event (typically 1 millisecond)

As most applications for Spinnaker have been to model the brain, most of the applications have been written in PyNN (a python neural network) which is then translated into code which can be hosted by SpiNNaker. The efficiency of the interconnections mean that brain simulations can now be executed in real-time, a significant improvement over conventional supercomputing.

In concluding, it is clear that whilst the focus has been on addressing the 'science' challenges, the are clearly insights into future computing in terms of improved inter-processor connectivity, improved energy utilization and a flexible platform. Whilst commercial exploitation has not been a major driving force for this project, I am confident that some of the approaches and ideas will find a way into main-stream computing in much the same way that 50 years ago, Manchester developed the paging algorithm which is now commonplace in all computing platforms.

The slides are available here.

Tuesday, February 19, 2013

Technologists are good for business

This evening's BCS/IET Turing Lecture given by Suranga Chandratillake, founder and former chief strategy officer of Blinkx, at Manchester University was an interesting talk linking the technical excellence of an engineer with the needs of an entrepreneur. His premise was that his undergraduate course in Computer Science at Cambridge University had provided him with many of the skills he needed to have a successful business career - it was just that he wasn't aware he had the skills.

Suranga first compared the stages that an inventor and entrepreneur went through with the evolution of an idea. The inventor would go from a position where he felt he wanted to challenge the world to the point where he had a flash of inspiration and onto the stage where the invention was now tangible Compare this to an entrepreneur who starts by thinking 'I need money' (because ideas are not enough) to the stage where the product or service is now a salable item up to the point where he is now making a profit. The UK is very good at educating and nurturing many great technologists to create and innovate; unfortunately it is not always good at exploiting these ideas mainly because many of the skills to allow a entrepreneur to exploit technical ideas are not well developed.

He described how he was offered the opportunity to be the founder CEO of Blinkx, a startup spun out from Autonomy. He was reluctant (very!) at taking on this role because he felt that he didn't have the necessary skills to fulfill the role as he was essentially a technologist. He struck a deal with Mike Lynch, CEO of Autonomy, that said that if he needed help with some of the business functions such as finance, HR, Sales and marketing that Mike would help him out. What amazed me was that the skills he needed for finance, marketing and sales were all taught on his undergraduate course, it was just that they weren't expressed in business manner. For example, for marketing to determine the most effective approach to use (e.g. PR, web-page banner ads or search adverts), requires the application of some simple probabilistic modelling, a 2nd year course. I felt he stretched the analogy a bit far when he compare a HR organisation to that of a system architecture; however, I think many aspects of HR (particularly recruitment) can be covered in parts of undergraduate courses particularly with the increasing amounts of team-working forming part of the curriculum.

Suringa summarised that the attributes of a technologist of being qualitative, rigorous and analytical had actually prepared him perfectly for business in a technical organisation. He stated that it is a fallacy that technologists do not understand business, it is just that they assume that they don't have the skills. This is a mental block rather than a a lack of ability.

I found the talk provided much food for thought. Clearly the business environment that Suranga operated in was not typical of many companies but it was illuminating to see he was able to relate back to his undergraduate course. The opportunity to work in a small company with a unique technology (as blinkx) is clearly not going to be available to everyone. However, provided the opportunities are available I am sure many more technologists should feel empowered to exploit technology to create viable and thriving businesses.

The BCS/IET Turing Lecture 2013: The IET Turing Lecture 2013: What they didn't teach me: building a technology company and taking it to market

Suranga Chandratillake

Play webcast

The IET Prestige Lecture Series 2013, Turing Lecture, Savoy Place, London, 18 February 2013

Wednesday, March 17, 2010

Creating Intelligent Machines

I have just attended the excellent IET/BCS 2010 Turing Lecture 'Embracing Uncertainty: The New Machine Intelligence' at the University of Manchester which was given this year by Professor Chris Bishop who is the Chief Research Scientist at Microsoft Research in Cambridge and also Chair of Computer Science at the University of Edinburgh. The lecture allowed Chris to share his undoubted passion for machine learning, and although there were a number of mathematical aspects mentioned during the talk, Chris managed to ensure everyone was able to understand the key concepts being described.

Chris started by explaining that his interest is in building a framework for building intelligence into computers, something which has been a goal for many researchers for many years. This is now becoming increasingly important due to the vast amounts of data which is now available for analysis. With the amount of data doubling every 18 months, there is an increasing need to move away from purely algorithmic ways of reviewing the data to solutions which are based on learning from the data. This has traditionally been the goal for machine (or artificial) intelligence and despite what Marvin Minsky wrote in 1967 in 'Computation: Finite and Infinite Machines' that "within a generation ... the problem of creating 'artificial intelligence' will substantially be solved", the problem still does not have a satisfactory solution for many classes of problem.

A quick summary of the history of artificial intelligence showed that expert systems, which were good at certain applications but required significant investment in capturing and defining the rules, and neural networks which provide a statistical learning approach but have difficulty in capturing the necessary domain knowledge within the model, were not adequate for today's class of problems. An alternative approach which was able to integrate domain knowledge with statistical learning was required and Chris's approach was to use a combination of approaches:

Bayesian Learning which uses probability distributions to quantify the uncertainty of the data. The distributions are amended once 'real data' is applied to the model which results in a reduction in the uncertainty.
Probabilistic Graphical Models which enables domain knowledge to be captured in directed graphs with each node having a probability distribution.
Efficient inference which ensures efficiency in computation

To explain the approach, Chris sensibly used real-life case studies to demonstrate the application of the theory in three very diverse applications.

His first example was of Bayesian Ranking system to be used in producing a global ranking from noisy partial rankings. The conventional approaches is to use the Elo rating system which is a method for calculating the relative skill levels of players in two-player games. The Elo system could not handle team games or more than 2 players. As part of the launch of the Xbox 360 Live online playing solution, Microsoft developed the TrueSkill algorithm to match opponents of similar skill levels. The TrueSkill algorithm converges far faster than Elo by managing the uncertainty in a more efficient way; it also operates quickly so that users can find suitable opponents in a few seconds out of a user population of many million. Further details on TrueSkill(TM) are available at http://research.microsoft.com/en-us/projects/trueskill/

The next example was for a website serving adverts and how to determine which advert to show based on the probability of being clicked and the value of click. The proposed approach was to use gausian probability in order to assign a weight to a number of features which is used to determine the ranking. However it is important to ensure that the system continually learns in order to re-evaluate the ranking to ensure that the solution accurately reflects the dynamics of the adverts. If this was not the case, it would be very difficult for a new advert to be be served.

The final example was the Manchester Asthma and Allergy Study which is working with a comprehensive data set acquired over 11 years. The data set is continually being augmented with new types of data (recently genetic data has been added) and the study has been successful at establishing the important variables and features and their relationships. By defining a highly structured model of the domain knowledge, it has been possible to assign each variable a probability distribution. By placing the data at the heart of the study and applying some machine learning techniques, a number of key observations are now being reported which might not have been apparent if more traditional statistical techniques had been used.

As a closing remark, Chris promoted a product from Microsoft Research (Infer.net) which provides a framework for further experimentation in developing Bayesian models for a variety of machine learning problems.

As is now traditional with the Turing Lecture, it is presented at several locations around the country. A webcast of the version presented at the IET in London is available on the IET TV channel.

Friday, June 20, 2008

The Relentless March of the MicroChip

I have just returned from the inaugural Kilburn Lecture which concluded a day celebrating 60 years since the first stored program computer (the Manchester 'baby'). The excellent lecture given by Professor Steve Furber gave a historical perspective of the major innovations which have originated from Manchester University over the last 60 years as the technology used in computing has developed together with a personal view of the developments which he has been personally involved.

Over the last 60 years, the technology has changed from the vacuum tubes used in the Manchester 'baby' and the Ferranti Mk1, the first commercial computer, through the transistor, which although invented in Bell labs in 1947 wasn't adopted for computers until the 1960's when the Atlas computer was developed, the fastest computer at the time, to the integrated circuit used in MU5 (a forerunner to the ICL 2900 series), Dataflow and Amulet systems. The Atlas computer also introduced the concept of the single level store, which is more commonly referred as virtual memory which one of the attendees at the lecture, Professor Dai Edwards, remarked that he still received payments for this invention. Each decade has seen the level of complexity increase as the number of transistors has increased which shows no sign of slowing down.

Steve also provided a personal perspective of his involvement in microprocessor design starting with the BBC Micro of 1982 whilst at Acorn. While the BBC Micro was primarily built from off the shelf components including the 6502 microprocessor, Steve designed two simple bespoke chips which helped reduce the chip count by 40. The success of the BBC Micro and its design led to Acorn developing further bespoke chips resulting in the Acorn Risc Machine (ARM) being released in 1985. This is probably one of the most significant microprocessor developments over the last 25 years as derivatives of this processor have now become a significant component in mobile phone technology. The ARM machine had a simple design, was small and had low power consumption and was really a System on a Chip (SoC). By 2008, there have been over 10 billion ARM processors delivered, making it the most numerous processor in the world. When Steve moved to Manchester University in 1990, he continued to use ARM technology in the AMULET system. Over various generations of AMULET, the size of the chips didn't change but the chips became more complex as the transistor spacing reduced from 1 micron in 1994, to 0.18 microns in 2003, which is smaller than the wavelength of visible light.

There was an interesting comparison of the changing energy requirements over the last 60 years. in 1948, the Manchester 'baby' required 3.5 KW of power in order to execute 700 instructions per second which represents 5 Joules/instruction. Contrast this with the ARM968 processor on 2008, which requires 20 mW of power and can execute 200 million instructions per second which would represents 0.0000000001 Joules/instruction. This represents a 50,000,000,000 times improvement! - there are few examples of such a dramatic improvement in energy efficiency. Steve did give a warning though as he stated that more efficient computing can lead to increasing power requirements.

No lecture on the development of processors can ignore Gordon Moore's seminal article (colloquially known as Moore's Law), and this lecture was no different. The original paper published in 1965 predicted the exponential increase in the number of transistors per chip only until 1975. However, this prediction is still true today and has become a self-professing policy and has become a key input into planning next generation microprocessor designs (see International Technology Roadmap for Semiconductors). The reduction has been achieved by shrinkage in transistor size, cheaper components and reduced power consumption. Steve cited that the current generation of microSD cards, which provide a flash memory of 12 GB contain over 50 million transistors in the size of a fingernail, demonstrates how much progress has been made since the original use of transistors in computers in such systems as Atlas. However, exponential progress cannot go on indefinitely and there are now some physical limits which will constrain progress. As components have increased in complexity, their reliability and lifetime (in many cases less than 12 months for some items) has reduced as the tolerances on such a great number of components are very fine. There is also a recognition that the cost of design to achieve advances in technology is becoming uneconomic as it is increasing at 37% per year. Steve considers that Moore's Law may survive for another 5-10 years with current technology but there is will need to be advances in alternative technology for it to continue beyond this (and there is no signs of this happening at the moment).

The current generation of microprocessors, dual core/multi-core, have tried to address the increasing constraints by selling more processors per microprocessor rather than a single faster processor. Moors' Law can also apply to multi-core systems however there is an increasing problem with the use of such processors by application software. As single processors have increased in performance, there has been little change to the way software has been developed. However, with the advance of multi-core technology, general purpose parallelism is required in order to maximise the available processing capacity. This is one of the 'holy grails' of computing and it is becoming an increasingly important problem to solve. The use of such multi-core processors also needs to be carefully considered. It would appear that is preferable to have lots of cheap (and simple) cores rather than a small number of faster (and complex) cores due to the significant power efficiency differences.

Steve finished by giving us a glimpse into the future as hew showed how microprocessor design was converging with biology. The human brain has many attributes which are similar to the requirements of a complex network of computers e.g. tolerant to component failure (e.g. loss of a neuron), adaptive, massively parallel, good connectivity and power efficient. Steve's current project SpiNNaker is trying to build a system which can perform a real-time simulation of biological transactions mapped onto a computer architecture. The project is using 1000's of ARM processors and is trying to meet one of the UKCRC's Grand Challenges in which the architecture of the mind and brain are modelled by computer.