Advanced Processor Technologies group overview 1 APT mission

  • Slides: 52
Download presentation
Advanced Processor Technologies group overview 1

Advanced Processor Technologies group overview 1

APT mission “To explore novel architectures and techniques that will enable the effective exploitation

APT mission “To explore novel architectures and techniques that will enable the effective exploitation of the billion transistor chips of the near-future” 2

APT group • Focus: – Moore’s Law will soon deliver billion transistor chips –

APT group • Focus: – Moore’s Law will soon deliver billion transistor chips – how do we make best use of a billion transistors? • parallel processing • systems-on-chip • novel architectures • …? 3

Strategy/Vision • Industry shift to multicore processors – directly addressed by our CMP work

Strategy/Vision • Industry shift to multicore processors – directly addressed by our CMP work • Power/heat is performance-limiting – asynchronous and low-power design have growing importance • Timing closure is a critical problem – acceptance of mixed timing and GALS • Design automation is vital – async automation must be competitive 4

Strategy/Vision • Can university groups design state- of-the-art digital silicon? – probably not in

Strategy/Vision • Can university groups design state- of-the-art digital silicon? – probably not in conventional processors – few academic groups still fab digital chips • Is trying to take designs through to fabrication still a good idea? – we believe so, because ‘reality’ matters! – but the game is very tough indeed 5

Many-core Architecture and Software Mikel Lujan 6

Many-core Architecture and Software Mikel Lujan 6

Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications,

Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications, programming languages, compilers runtime systems (OS), computer architecture] 7

Active projects • Managed Runtime Environments and Low-Power Many-core Architectures – DOME Delaying and

Active projects • Managed Runtime Environments and Low-Power Many-core Architectures – DOME Delaying and Overcoming Microprocessor Errors • Teraflux – On the search for a “good” parallel computational model • AXLE – Accelerating Analytics of Big Data 8

Managed Runtime Enviroments • Java, . Net are examples of managed runtime environments (JVM,

Managed Runtime Enviroments • Java, . Net are examples of managed runtime environments (JVM, CLR) • Key elements: JIT compilation and control of memory allocation • Research opportunities: – Scaling MREs for many-core architectures (GPUs) – Hardware acceleration of MREs – Use MREs for low-power computing – Use MREs for dealing with faults and transistor wearout -> DOME 9

Tera. Flux Project • Major focus of current ‘General Purpose’ Many-Core research. • Three

Tera. Flux Project • Major focus of current ‘General Purpose’ Many-Core research. • Three major goals – To define the hardware architecture of a highly extensible, general purpose multicore system – To develop a simple to use parallel programming approach based on programming with • side-effect-free computations + transactions – How do we simulate/prototype many- cores architectures? 10

Starting Assumptions • Requiring strongly consistent shared memory is a major impediment to extensibility

Starting Assumptions • Requiring strongly consistent shared memory is a major impediment to extensibility • The efficient scheduling of controlflow based threads is hard • The major complexity in parallel programming is the handling of shared state (locks etc. ) 11

Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer

Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer architects build software models to simulate new architectures • Simulation can be slow (months to run one application) • How we can accelerate this process? Research opportunities – New modelling techniques – FPGA prototyping 12

AXLE & Big Data • Collaboration with Dr. Gavin Brown (MLO group) • Amount

AXLE & Big Data • Collaboration with Dr. Gavin Brown (MLO group) • Amount of data generated in scientific experiments or social web keeps growing! • Graph-based data -> complex computation • How can we make sense of this data deluge? – New Learning techniques capable of working at scale – Redesign architectures (clusters/data centres) and software for low power analytics – Accelerate software (JIT adaptation) for data processing – Hardware acceleration for low-power learning algorithms 13

For more background info • "Future Multi-core Computing" (COMP 6062 b) – Learn by

For more background info • "Future Multi-core Computing" (COMP 6062 b) – Learn by directed reading and group discussions of research papers – Practice parallel programming in the labs • Watch out for the organised ARM & Intel school seminars in Nov and Dec 14

Communication Architectures Javier Navaridas 15

Communication Architectures Javier Navaridas 15

Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance

Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance computing networks – Massively Parallel Processing systems – Compute Clusters – Datacentres 16

Topics • Topologies – Routing – Wiring – Fault resilience – Deadlock avoidance •

Topics • Topologies – Routing – Wiring – Fault resilience – Deadlock avoidance • Router microarchitecture – Congestion control – Quality of Service – Fault tolerance • Scheduling and resource management – Task placement • System and workload modelling – Analytical modelling – Simulation 17

Virtualization Alasdair Rawsthorne 18

Virtualization Alasdair Rawsthorne 18

Unifying System and Process Virtualization Application Operating System Dynamic Runtime Operating System CPU Hypervisor/VMM

Unifying System and Process Virtualization Application Operating System Dynamic Runtime Operating System CPU Hypervisor/VMM Operating System Optimizing VMM CPU CPU System Virtualization (eg Xen, Vmware, Virtual. Box) Process Virtualization (eg JVM, Rosetta, Dynamo. RIO, Val. Grind) Unified Virtualization Unvirtualized • Potential benefits: performance, power, design time, security • Impacts design of future compilers, OS, CPU and runtimes alasdair. rawsthorne@manchester. ac. uk 19

Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20

Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20

The Spi. NNaker project • Multi-core CPU node – 18 ARM 968 processors –

The Spi. NNaker project • Multi-core CPU node – 18 ARM 968 processors – to model large-scale systems of spiking neurons – in biological real time • Scalable up to systems with 10, 000 s of nodes – over a million processors – >108 MIPS total 21

Current status… • Full 18 -core chip: arrived 20 May 2011 • Test card:

Current status… • Full 18 -core chip: arrived 20 May 2011 • Test card: 4 chips, 72 processors – Cards can be linked together • Neuron models: LIF, Izhikevich, MLP • Synapse models: STDP, NMDA • Networks: Py. NN -> Spi. NNaker, various small tools to build Router tables, etc • 48 -chip 103 machine …and the next steps: • 500 -chip 104 machine (Q 4 2012), 5, 000 -chip 105 machine (H 1 2013), 50, 000 -chip 106 machine (H 2 2013). 22

Ph. D projects • Recent: – Spi. NNaker monitoring – Py. NN -> Spi.

Ph. D projects • Recent: – Spi. NNaker monitoring – Py. NN -> Spi. NNaker – Real-time neural learning algorithms – Modelling the rat barrel cortex – Technology scaling on Spi. NNaker – Error correction with CRC 23

Technology Scaling • 90 nm Spi. NNaker CPU node SP library is faster •

Technology Scaling • 90 nm Spi. NNaker CPU node SP library is faster • • • requires 128 k DTCM LL library better overall? (work by Eustace Painkras, Uo. M Ph. D) 24

Py. NN -> Spi. NN • LIF • Izhikevich 25

Py. NN -> Spi. NN • LIF • Izhikevich 25

Ph. D projects • Future: – System software • run-time fault-tolerance, scaling, … –

Ph. D projects • Future: – System software • run-time fault-tolerance, scaling, … – Spi. NNaker 2 architecture exploration – Neural network models • learning algorithms, rewiring – Robotics using Spi. NNaker – Non-neural algorithms • graphics, physics modelling, … 26

Emerging Technologies for Integrated Circuits and Systems Let’s do some hard(ware) work Vasilis Pavlidis

Emerging Technologies for Integrated Circuits and Systems Let’s do some hard(ware) work Vasilis Pavlidis www. cs. man. ac. uk/~pavlidiv 27

3 -D Integration Opportunities 2 -D global wire of 20 mm 3 -D global

3 -D Integration Opportunities 2 -D global wire of 20 mm 3 -D global wire of 12 mm • Integrate disparate • The same total area for the two circuits • RTSV = 170 mΩ, CTSV = 2 f. F *RCs for 65 nm, Del. Impr: 54% • * “ASU Predictive Technology Model. ” [Online]. Available: http: //www. eas. asu. edu/~ptm/ technologies/components 28 28

Three-Dimensional (3 -D) Integrated Circuits and Systems • Develop design methodologies for 3 -D

Three-Dimensional (3 -D) Integrated Circuits and Systems • Develop design methodologies for 3 -D ICs • New models are required to consider the third physical dimension • Diverse technologies – Si. P, interposer, TSVs • Many challenges exist down the road!!! – Be the first to address them • Opportunities to tape-out do exist! – CMP/Tezzaron - cmp. imag. fr Xilinx FPGA Virtex 7 – Cadence PDK - 3 -D Encounter 29

A New Circuit Design Paradigm (Safe Projects ) • (Re-)Design and assess Spi. NNaker-based

A New Circuit Design Paradigm (Safe Projects ) • (Re-)Design and assess Spi. NNaker-based 3 -D architectures – Power, area, performance, cost/yield – Interposer and TSVs technologies • Research methodology – Use available resources – Differentiate only where required • Other topics – Can resonance improve energy efficiency of GALS based architectures? – Design for manufacturability for GALS systems 2 -D/3 -D • Considering process, voltage, and temperature (PVT) variations • PVT behavior is substantially different in 3 -D systems § Develop/extend CAD tools for the physical design of 3 D systems – Special focus on interposer technologies 30

3 -D Integration as a System Integration Approach (High-Return Projects) • Heterogeneous 3 -D

3 -D Integration as a System Integration Approach (High-Return Projects) • Heterogeneous 3 -D integration – Preached a lot but not explored (at all)! • Memory on logic is a single application • Develop techniques and methods for “Mix-and-Match” systems – How do you model…? – How do evaluate…? – How do you integrate…? • § Interdisciplinary research is a – How do you manufacture…? prerequisite for such systems The physical proximity of diverse systems may not come § Rather application driven for free! 31 31

Ph. D Guidelines § Ph. D is NOT an end in itself but a

Ph. D Guidelines § Ph. D is NOT an end in itself but a means to end! • Persistence, • • Persistence! Manage rejection Be there early! Citations value more than publications Presentation and writing skills 32 32

Asynchronous Logic Design Tools [Doug Edwards, ] Jim Garside, Steve Furber, Alasdair Rawsthorne 33

Asynchronous Logic Design Tools [Doug Edwards, ] Jim Garside, Steve Furber, Alasdair Rawsthorne 33

Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete

Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete microprocessors • SEDATE – delay Insensitive datapath synthesis • GALSA – framework for heterogeneous GALS • . . . 34

GAELS • Globally Asynchronous Elastic Logic Synthesis – modern So. Cs comprise numerous, semi-autonomous

GAELS • Globally Asynchronous Elastic Logic Synthesis – modern So. Cs comprise numerous, semi-autonomous subsystems – shrinking transistors have hard-topredict variations • Address using Elastic Logic – new, delay tolerant paradigm – new project! 35

Reconfigurable Processing Jim Garside 36

Reconfigurable Processing Jim Garside 36

Current Computing • Energy use is a problem • Software – offers processing flexibility

Current Computing • Energy use is a problem • Software – offers processing flexibility – highly inefficient – big overheads • Hardware – limited programmability – greater efficiency – expensive to develop 37

A Solution? • Compile an algorithm into a mixture of hardware and software –

A Solution? • Compile an algorithm into a mixture of hardware and software – how to partition the 'code'? – dynamic adaptation • Existing solutions tend towards static partitioning – require wide skills from developers – sacrifice potential flexibility – intolerant of differing hardware 38

Dynamic Reconfiguration • Keep algorithm in common 'object' format • Identify, 'compile' and run

Dynamic Reconfiguration • Keep algorithm in common 'object' format • Identify, 'compile' and run repeating sections in available hardware • Adapt to facilities of any given chip – allow for future portability 39

To date. . . • Can identify critical loops and recompile them to hardware

To date. . . • Can identify critical loops and recompile them to hardware – using pre-existing code • Developing tool flow • Have reasonable reconfigurable hardware architecture Results • Promising – not 'earth shattering' 40

Future • Want: • Means of expressing algorithms allowing easy compilation into software or

Future • Want: • Means of expressing algorithms allowing easy compilation into software or hardware • Extract/exploit sensible parallelism – 'fine grain' for hardware – 'coarse grain' (? ) for software • Get (some of) the available speed/power efficiency 41

Mobile Systems Architecture Nick Filer with help from Barry Cheetham 42

Mobile Systems Architecture Nick Filer with help from Barry Cheetham 42

Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, •

Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, • Voice over IP, • Sensors (data collection) , • Pocket networks (e. g. mobile phones, PDAs), • Information dissemination. – Supported by: • Simulation, analysis, software generation tools. – e. Learning tools for science. 43

Current Interest - 1 • Pocket Networks – Based on clusters of mobile users.

Current Interest - 1 • Pocket Networks – Based on clusters of mobile users. – Person to person transport. – What applications are useful, will work, when and how will applications work? • Voice? • Video? • Delay tolerant text messages? 44

Current Interest - 2 • Low power Wireless Sensor Networks – Algorithms for reduced

Current Interest - 2 • Low power Wireless Sensor Networks – Algorithms for reduced power usage, mainly getting it low by design. – Intelligent transport/routing protocols driving low power packet routing. – Smart dust: • Current cost $100+, needs to be cheaper. • Ultra-low power (NEW): processor, memory, design. • Nano scale. E. g. for use down oil wells! 45

Current Interest – 3 • Hand-over in mobile wireless networks. – Pretty much solved

Current Interest – 3 • Hand-over in mobile wireless networks. – Pretty much solved problem (even if not always ideal) for mobile phones. – Close to solutions for Wi. Fi, Wi. MAX, Bluetooth, Zigbee etc. Still lots to learn though. – Currently 3 layer hierarchy – infrastructure Wide Area Personal Area. – What happens with more layers? • Macro scale to nano scale? • Fixed infrastructure interacting with mobile autonomous agents? • Just how inefficient are these mechanisms currently? 46

Current Interest - 4 • Information dissemination in mobile ad -hoc networks. – P

Current Interest - 4 • Information dissemination in mobile ad -hoc networks. – P 2 P technologies. – P 2 P optimization for task, availability, handover, low energy, access latency… – P 2 P to aid DNS like queries (information retrieval) in mobile, changing topology networks. – Delay tolerant P 2 P. Opportunistic communications e. g. send 100, 000 sensors down an oil well, get 1 back, what does it know? Own data, others data? 47

Joint with Barry Cheetham Current Interest - 5 • Real time distributed systems (sound

Joint with Barry Cheetham Current Interest - 5 • Real time distributed systems (sound and video) – Internet choir • Very tight audio constraints (max 50 ms) • Demands of latency & bandwidth – Singing together • Less constrained internet choir but synchronization very difficult. – Broadcast simulcasts • Mixed video and sound from various locations. • Broadcast over multiple media types with different delay etc. characteristics. – Major Obstacles: • Media types and standards, protocols, congestion, error handling, signal processing, links to hand-over problems. . 48

Current Interest - 6 • Support for adaptable network stacks – Writing or changing

Current Interest - 6 • Support for adaptable network stacks – Writing or changing software is time consuming, error prone, … – Models can capture semantics of software: Purpose, usage, transformation knowledge. . . – Hence: Use models to generate implementations. • Use in teaching/learning, simulation, network stack implementation. – Support for adaptable network stacks 49

Joint with Barry Cheetham Current Interest – 7 • e. Learning for Complex Systems

Joint with Barry Cheetham Current Interest – 7 • e. Learning for Complex Systems – Most e. Learning tools you have seen are not much more Content Management Systems. – There is currently little or no evidence they improve student grades! – We have on-going work looking at improving understanding of wireless systems. – Also, interested in science teaching for awkward adolescents. 50

Arithmetic and Control Theory Dave Lester 51

Arithmetic and Control Theory Dave Lester 51

Arithmetic and Control Theory • Exact Arithmetic – NASA/Boeing • Correctness of Control Theory

Arithmetic and Control Theory • Exact Arithmetic – NASA/Boeing • Correctness of Control Theory Applications – Airbus • Formalisation and Mechanisation of Probabilistic Reasoning 52