I have just attended an
excellent talk by Steve Furber, Professor of Computer Engineering at
University of Manchester on the challenges on programming a million
core machine as part of the SpiNNaker project.
The
SpiNNaker project has been in existence for around 15 years and has
been attempting to answer two fundamental questions:
- How does the brain do what it does? Can massively parallel computing accelerate our understanding of the brain?
- How can our (increasing) understanding of the brain help us create more efficient, parallel and fault-tolerant computation?
The
comparison of a parallel computer with a brain is not accidental since brains share many of the required attributes being massively
parallel, have lots of interconnections, provide excellent
power-efficiency, require low speed communications, is
adaptable/fault-tolerant (of failure) and capable of learning autonomously. The
challenges for computing as Moore's law progresses is that there will
eventually come a time when further increases in speed will not be
possible and as processing speed has increased, energy efficiency has become
an increasingly important characteristic to address. The future is
therefore parallel but the approach to handling this is far from clear.
The SpiNNaker project has been established to attempt to model a
brain (around 1% of a human brain) using approximately 1 million
mobile phone chips with efficient asynchronous interconnections whilst also examining the approach to developing efficient parallel applications.
The
project is built on 3 core principles:
- The topology is virtualised and is as generic as possible. The physical and logical connectivity are decoupled.
- There is no global synchronisation between the processing elements.
- Energy frugality such that that cost of a processor is zero (removing the need for load balancing) and the energy usage of each processor is minimised.
[As
an aside, energy efficient computing is a growing interest such that
when a program is constructed, how much energy is required to
complete the computation is now the key factor in many systems (in
terms of operational cost)]
The
SpiNNaker project has designed a node which contains two chips; one
chip is used for processing and consists of 18 ARM processors (1
hosts the operating system, 16 are used for application execution and
1 is spare) and the other chip is for memory (SDRAM). The nodes are
connected in a 2D-mesh due to simplicity and cost. 48 nodes are
assembled onto a PCB such that 864 processors are available per
board. The processor only supports integer computation. The major
innovation in the design is the interconnectivity within a node and
between nodes on a board, A simple packet switched network is used to
send very small packets around; each node has a router which is used
to efficiently send the packets either within the node or to a
neighbouring node. Ultimately, 24 PCBs are housed within a single 19”
rack which are then housed (5) within a cabinet such that each
cabinet has 120 PCBs which equates to 5760 nodes or 103680
processors. 10 cabinets would therefore result in over 1 million
processors and would require around 10KW. A host machine (running
Linux) is connected via Ethernet to the cabinet (and optionally each
board).
Networking
(and is efficiency) is the key challenge to emulate neurons. The
approach by Spinnaker is to capture a simple spike (representing a
neuron communication) within a small packet (40 bits) and then
multicast this data around (each neuron is allocated a unique
identifier, there is a theoretical limit of 4 billion neurons which
can be modelled). By the use of a 3-stage associative memory holding
some simple routing information, the destination of each event can be
determined. If the table does not contain an entry, the packet is
simply passed through to the next router. This approach is ideally
suited to a static network or a (very) slowly changing network. It
struck me that this simple approach could be very useful in efficient
communication across the internet and maybe useful for meeting
the challenge of the 'Internet of Things'.
Developing
applications for SpiNNaker requires that the problem is split into
two parts; one part handles the connectivity graph between nodes; the
other part handles the conventional computing cycle with compile/link
and deploy. Whilst the performance in terms of throughput is
impressive (250 Gbps for 1024 links), it is the throughput which is
exceptional at over 10 Billion packets/second.
The
programming approach is to use an event-driven programming paradigm
which discourages single-threaded execution. Each node runs a single
application with the applications (written in C) communicating via an
API to SARK (the SpiNNaker Application Runtime Kernel) which is
hosted on the processor. The event model effectively maps to
interrupt handlers on the processor with 3 key events handled by each
application:
- A new packet (highest priority)
- A (DMA) memory transfer
- A timer event (typically 1 millisecond)
As
most applications for Spinnaker have been to model the brain, most of
the applications have been written in PyNN (a python neural network)
which is then translated into code which can be hosted by SpiNNaker.
The efficiency of the interconnections mean that brain simulations
can now be executed in real-time, a significant improvement over
conventional supercomputing.
In
concluding, it is clear that whilst the focus has been on addressing
the 'science' challenges, the are clearly insights into future
computing in terms of improved inter-processor connectivity, improved
energy utilization and a flexible platform. Whilst commercial
exploitation has not been a major driving force for this project, I
am confident that some of the approaches and ideas will find a way
into main-stream computing in much the same way that 50 years ago,
Manchester developed the paging algorithm which is now commonplace in
all computing platforms.
The slides are available here.
The slides are available here.