Advanced Processor Technologies group overview 1 APT mission

APT mission “To explore novel architectures and techniques that will enable the effective exploitation

APT group • Focus: – Moore’s Law will soon deliver billion transistor chips –

Strategy/Vision • Industry shift to multicore processors – directly addressed by our CMP work

Strategy/Vision • Can university groups design state- of-the-art digital silicon? – probably not in

Many-core Architecture and Software Mikel Lujan 6

Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications,

Active projects • Managed Runtime Environments and Low-Power Many-core Architectures – DOME Delaying and

Managed Runtime Enviroments • Java, . Net are examples of managed runtime environments (JVM,

Tera. Flux Project • Major focus of current ‘General Purpose’ Many-Core research. • Three

Starting Assumptions • Requiring strongly consistent shared memory is a major impediment to extensibility

Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer

AXLE & Big Data • Collaboration with Dr. Gavin Brown (MLO group) • Amount

For more background info • "Future Multi-core Computing" (COMP 6062 b) – Learn by

Communication Architectures Javier Navaridas 15

Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance

Topics • Topologies – Routing – Wiring – Fault resilience – Deadlock avoidance •

Unifying System and Process Virtualization Application Operating System Dynamic Runtime Operating System CPU Hypervisor/VMM

Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20

The Spi. NNaker project • Multi-core CPU node – 18 ARM 968 processors –

Current status… • Full 18 -core chip: arrived 20 May 2011 • Test card:

Ph. D projects • Recent: – Spi. NNaker monitoring – Py. NN -> Spi.

Technology Scaling • 90 nm Spi. NNaker CPU node SP library is faster •

Ph. D projects • Future: – System software • run-time fault-tolerance, scaling, … –

Emerging Technologies for Integrated Circuits and Systems Let’s do some hard(ware) work Vasilis Pavlidis

3 -D Integration Opportunities 2 -D global wire of 20 mm 3 -D global

Three-Dimensional (3 -D) Integrated Circuits and Systems • Develop design methodologies for 3 -D

A New Circuit Design Paradigm (Safe Projects ) • (Re-)Design and assess Spi. NNaker-based

3 -D Integration as a System Integration Approach (High-Return Projects) • Heterogeneous 3 -D

Ph. D Guidelines § Ph. D is NOT an end in itself but a

Asynchronous Logic Design Tools [Doug Edwards, ] Jim Garside, Steve Furber, Alasdair Rawsthorne 33

Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete

GAELS • Globally Asynchronous Elastic Logic Synthesis – modern So. Cs comprise numerous, semi-autonomous

Reconfigurable Processing Jim Garside 36

Current Computing • Energy use is a problem • Software – offers processing flexibility

A Solution? • Compile an algorithm into a mixture of hardware and software –

Dynamic Reconfiguration • Keep algorithm in common 'object' format • Identify, 'compile' and run

To date. . . • Can identify critical loops and recompile them to hardware

Future • Want: • Means of expressing algorithms allowing easy compilation into software or

Mobile Systems Architecture Nick Filer with help from Barry Cheetham 42

Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, •

Current Interest - 1 • Pocket Networks – Based on clusters of mobile users.

Current Interest - 2 • Low power Wireless Sensor Networks – Algorithms for reduced

Current Interest – 3 • Hand-over in mobile wireless networks. – Pretty much solved

Current Interest - 4 • Information dissemination in mobile ad -hoc networks. – P

Joint with Barry Cheetham Current Interest - 5 • Real time distributed systems (sound

Current Interest - 6 • Support for adaptable network stacks – Writing or changing

Joint with Barry Cheetham Current Interest – 7 • e. Learning for Complex Systems

Arithmetic and Control Theory Dave Lester 51

Arithmetic and Control Theory • Exact Arithmetic – NASA/Boeing • Correctness of Control Theory

Slides: 52

Download presentation

Advanced Processor Technologies group overview 1

APT mission “To explore novel architectures and techniques that will enable the effective exploitation of the billion transistor chips of the near-future” 2

APT group • Focus: – Moore’s Law will soon deliver billion transistor chips – how do we make best use of a billion transistors? • parallel processing • systems-on-chip • novel architectures • …? 3

Strategy/Vision • Industry shift to multicore processors – directly addressed by our CMP work • Power/heat is performance-limiting – asynchronous and low-power design have growing importance • Timing closure is a critical problem – acceptance of mixed timing and GALS • Design automation is vital – async automation must be competitive 4

Strategy/Vision • Can university groups design state- of-the-art digital silicon? – probably not in conventional processors – few academic groups still fab digital chips • Is trying to take designs through to fabrication still a good idea? – we believe so, because ‘reality’ matters! – but the game is very tough indeed 5

Many-core Architecture and Software Mikel Lujan 6

Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications, programming languages, compilers runtime systems (OS), computer architecture] 7

Active projects • Managed Runtime Environments and Low-Power Many-core Architectures – DOME Delaying and Overcoming Microprocessor Errors • Teraflux – On the search for a “good” parallel computational model • AXLE – Accelerating Analytics of Big Data 8

Managed Runtime Enviroments • Java, . Net are examples of managed runtime environments (JVM, CLR) • Key elements: JIT compilation and control of memory allocation • Research opportunities: – Scaling MREs for many-core architectures (GPUs) – Hardware acceleration of MREs – Use MREs for low-power computing – Use MREs for dealing with faults and transistor wearout -> DOME 9

Tera. Flux Project • Major focus of current ‘General Purpose’ Many-Core research. • Three major goals – To define the hardware architecture of a highly extensible, general purpose multicore system – To develop a simple to use parallel programming approach based on programming with • side-effect-free computations + transactions – How do we simulate/prototype many- cores architectures? 10

Starting Assumptions • Requiring strongly consistent shared memory is a major impediment to extensibility • The efficient scheduling of controlflow based threads is hard • The major complexity in parallel programming is the handling of shared state (locks etc. ) 11

Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer architects build software models to simulate new architectures • Simulation can be slow (months to run one application) • How we can accelerate this process? Research opportunities – New modelling techniques – FPGA prototyping 12

AXLE & Big Data • Collaboration with Dr. Gavin Brown (MLO group) • Amount of data generated in scientific experiments or social web keeps growing! • Graph-based data -> complex computation • How can we make sense of this data deluge? – New Learning techniques capable of working at scale – Redesign architectures (clusters/data centres) and software for low power analytics – Accelerate software (JIT adaptation) for data processing – Hardware acceleration for low-power learning algorithms 13

For more background info • "Future Multi-core Computing" (COMP 6062 b) – Learn by directed reading and group discussions of research papers – Practice parallel programming in the labs • Watch out for the organised ARM & Intel school seminars in Nov and Dec 14

Communication Architectures Javier Navaridas 15

Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance computing networks – Massively Parallel Processing systems – Compute Clusters – Datacentres 16

Topics • Topologies – Routing – Wiring – Fault resilience – Deadlock avoidance • Router microarchitecture – Congestion control – Quality of Service – Fault tolerance • Scheduling and resource management – Task placement • System and workload modelling – Analytical modelling – Simulation 17

Virtualization Alasdair Rawsthorne 18

Unifying System and Process Virtualization Application Operating System Dynamic Runtime Operating System CPU Hypervisor/VMM Operating System Optimizing VMM CPU CPU System Virtualization (eg Xen, Vmware, Virtual. Box) Process Virtualization (eg JVM, Rosetta, Dynamo. RIO, Val. Grind) Unified Virtualization Unvirtualized • Potential benefits: performance, power, design time, security • Impacts design of future compilers, OS, CPU and runtimes alasdair. rawsthorne@manchester. ac. uk 19

Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20

The Spi. NNaker project • Multi-core CPU node – 18 ARM 968 processors – to model large-scale systems of spiking neurons – in biological real time • Scalable up to systems with 10, 000 s of nodes – over a million processors – >108 MIPS total 21

Current status… • Full 18 -core chip: arrived 20 May 2011 • Test card: 4 chips, 72 processors – Cards can be linked together • Neuron models: LIF, Izhikevich, MLP • Synapse models: STDP, NMDA • Networks: Py. NN -> Spi. NNaker, various small tools to build Router tables, etc • 48 -chip 103 machine …and the next steps: • 500 -chip 104 machine (Q 4 2012), 5, 000 -chip 105 machine (H 1 2013), 50, 000 -chip 106 machine (H 2 2013). 22

Ph. D projects • Recent: – Spi. NNaker monitoring – Py. NN -> Spi. NNaker – Real-time neural learning algorithms – Modelling the rat barrel cortex – Technology scaling on Spi. NNaker – Error correction with CRC 23

Technology Scaling • 90 nm Spi. NNaker CPU node SP library is faster • • • requires 128 k DTCM LL library better overall? (work by Eustace Painkras, Uo. M Ph. D) 24

Py. NN -> Spi. NN • LIF • Izhikevich 25

Ph. D projects • Future: – System software • run-time fault-tolerance, scaling, … – Spi. NNaker 2 architecture exploration – Neural network models • learning algorithms, rewiring – Robotics using Spi. NNaker – Non-neural algorithms • graphics, physics modelling, … 26

Emerging Technologies for Integrated Circuits and Systems Let’s do some hard(ware) work Vasilis Pavlidis www. cs. man. ac. uk/~pavlidiv 27

3 -D Integration Opportunities 2 -D global wire of 20 mm 3 -D global wire of 12 mm • Integrate disparate • The same total area for the two circuits • RTSV = 170 mΩ, CTSV = 2 f. F *RCs for 65 nm, Del. Impr: 54% • * “ASU Predictive Technology Model. ” [Online]. Available: http: //www. eas. asu. edu/~ptm/ technologies/components 28 28

Three-Dimensional (3 -D) Integrated Circuits and Systems • Develop design methodologies for 3 -D ICs • New models are required to consider the third physical dimension • Diverse technologies – Si. P, interposer, TSVs • Many challenges exist down the road!!! – Be the first to address them • Opportunities to tape-out do exist! – CMP/Tezzaron - cmp. imag. fr Xilinx FPGA Virtex 7 – Cadence PDK - 3 -D Encounter 29

A New Circuit Design Paradigm (Safe Projects ) • (Re-)Design and assess Spi. NNaker-based 3 -D architectures – Power, area, performance, cost/yield – Interposer and TSVs technologies • Research methodology – Use available resources – Differentiate only where required • Other topics – Can resonance improve energy efficiency of GALS based architectures? – Design for manufacturability for GALS systems 2 -D/3 -D • Considering process, voltage, and temperature (PVT) variations • PVT behavior is substantially different in 3 -D systems § Develop/extend CAD tools for the physical design of 3 D systems – Special focus on interposer technologies 30

3 -D Integration as a System Integration Approach (High-Return Projects) • Heterogeneous 3 -D integration – Preached a lot but not explored (at all)! • Memory on logic is a single application • Develop techniques and methods for “Mix-and-Match” systems – How do you model…? – How do evaluate…? – How do you integrate…? • § Interdisciplinary research is a – How do you manufacture…? prerequisite for such systems The physical proximity of diverse systems may not come § Rather application driven for free! 31 31

Ph. D Guidelines § Ph. D is NOT an end in itself but a means to end! • Persistence, • • Persistence! Manage rejection Be there early! Citations value more than publications Presentation and writing skills 32 32

Asynchronous Logic Design Tools [Doug Edwards, ] Jim Garside, Steve Furber, Alasdair Rawsthorne 33

Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete microprocessors • SEDATE – delay Insensitive datapath synthesis • GALSA – framework for heterogeneous GALS • . . . 34

GAELS • Globally Asynchronous Elastic Logic Synthesis – modern So. Cs comprise numerous, semi-autonomous subsystems – shrinking transistors have hard-topredict variations • Address using Elastic Logic – new, delay tolerant paradigm – new project! 35

Reconfigurable Processing Jim Garside 36

Current Computing • Energy use is a problem • Software – offers processing flexibility – highly inefficient – big overheads • Hardware – limited programmability – greater efficiency – expensive to develop 37

A Solution? • Compile an algorithm into a mixture of hardware and software – how to partition the 'code'? – dynamic adaptation • Existing solutions tend towards static partitioning – require wide skills from developers – sacrifice potential flexibility – intolerant of differing hardware 38

Dynamic Reconfiguration • Keep algorithm in common 'object' format • Identify, 'compile' and run repeating sections in available hardware • Adapt to facilities of any given chip – allow for future portability 39

To date. . . • Can identify critical loops and recompile them to hardware – using pre-existing code • Developing tool flow • Have reasonable reconfigurable hardware architecture Results • Promising – not 'earth shattering' 40

Future • Want: • Means of expressing algorithms allowing easy compilation into software or hardware • Extract/exploit sensible parallelism – 'fine grain' for hardware – 'coarse grain' (? ) for software • Get (some of) the available speed/power efficiency 41

Mobile Systems Architecture Nick Filer with help from Barry Cheetham 42

Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, • Voice over IP, • Sensors (data collection) , • Pocket networks (e. g. mobile phones, PDAs), • Information dissemination. – Supported by: • Simulation, analysis, software generation tools. – e. Learning tools for science. 43

Current Interest - 1 • Pocket Networks – Based on clusters of mobile users. – Person to person transport. – What applications are useful, will work, when and how will applications work? • Voice? • Video? • Delay tolerant text messages? 44

Current Interest - 2 • Low power Wireless Sensor Networks – Algorithms for reduced power usage, mainly getting it low by design. – Intelligent transport/routing protocols driving low power packet routing. – Smart dust: • Current cost $100+, needs to be cheaper. • Ultra-low power (NEW): processor, memory, design. • Nano scale. E. g. for use down oil wells! 45

Current Interest – 3 • Hand-over in mobile wireless networks. – Pretty much solved problem (even if not always ideal) for mobile phones. – Close to solutions for Wi. Fi, Wi. MAX, Bluetooth, Zigbee etc. Still lots to learn though. – Currently 3 layer hierarchy – infrastructure Wide Area Personal Area. – What happens with more layers? • Macro scale to nano scale? • Fixed infrastructure interacting with mobile autonomous agents? • Just how inefficient are these mechanisms currently? 46

Current Interest - 4 • Information dissemination in mobile ad -hoc networks. – P 2 P technologies. – P 2 P optimization for task, availability, handover, low energy, access latency… – P 2 P to aid DNS like queries (information retrieval) in mobile, changing topology networks. – Delay tolerant P 2 P. Opportunistic communications e. g. send 100, 000 sensors down an oil well, get 1 back, what does it know? Own data, others data? 47

Joint with Barry Cheetham Current Interest - 5 • Real time distributed systems (sound and video) – Internet choir • Very tight audio constraints (max 50 ms) • Demands of latency & bandwidth – Singing together • Less constrained internet choir but synchronization very difficult. – Broadcast simulcasts • Mixed video and sound from various locations. • Broadcast over multiple media types with different delay etc. characteristics. – Major Obstacles: • Media types and standards, protocols, congestion, error handling, signal processing, links to hand-over problems. . 48

Current Interest - 6 • Support for adaptable network stacks – Writing or changing software is time consuming, error prone, … – Models can capture semantics of software: Purpose, usage, transformation knowledge. . . – Hence: Use models to generate implementations. • Use in teaching/learning, simulation, network stack implementation. – Support for adaptable network stacks 49

Joint with Barry Cheetham Current Interest – 7 • e. Learning for Complex Systems – Most e. Learning tools you have seen are not much more Content Management Systems. – There is currently little or no evidence they improve student grades! – We have on-going work looking at improving understanding of wireless systems. – Also, interested in science teaching for awkward adolescents. 50

Arithmetic and Control Theory Dave Lester 51

Arithmetic and Control Theory • Exact Arithmetic – NASA/Boeing • Correctness of Control Theory Applications – Airbus • Formalisation and Mechanisation of Probabilistic Reasoning 52