Advanced Processor Technologies group overview 1 APT group

APT group • Mission: – Moore’s Law will soon deliver billion transistor chips –

Strategy/Vision • Integrating focus on many-core systems – hardware, architecture, run-time systems, compilation, programming

Strategy/Vision • Underpinning technology themes: – energy-efficiency – reliability & fault-tolerance – FPGAs &

Major Funding • ERC Advanced Grant: Biologically- Inspired Massively Parallel Computing • EPSRC Programme

Many-core Architecture and Software Mikel Lujan Antoniu Pop 6

Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications,

Active projects • Many cores systems – Teraflux – parallel computational model • Novel

Open. Stream Project • Make data flow programming easy – Open. MP-style annotations on

Managed Runtime Enviroments • Java, . Net are examples of managed runtime environments (JVM,

Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer

AXLE & Big Data • Collaboration with Dr. Javier Navaridas & Dr. Gavin Brown

Communication Architectures Javier Navaridas 13

Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance

Topics of interest • Topologies – Routing – Wiring – Fault resilience – Deadlock

INPUT Project • In collaboration with Durham University • Investigate how practical and theoretical

High Performance Computing Graham Riley 17

Graham Riley • Interests: – Parallel Performance Analysis and Improvement • Techniques and methods

Current projects • NERC ‘Gung. Ho’ (NERC) – Developing a new, highly scalable, dynamical

Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20

Spi. NNaker project • A million mobile phone processors in one computer • Able

Spi. NNaker chip Multi-chip packaging by UNISEM Europe 22

Spi. NNaker applications • A wide range of global collaborators • Annual workshops: –

Ph. D projects • Recent: – Spi. NNaker monitoring – Py. NN -> Spi.

3 -D Integrated Circuits & Systems Vasilis Pavlidis pavlidis@cs. man. ac. uk www. cs.

3 -D Integration Benefits 2 -D global wire of 20 mm 3 -D global

3 -D Integration Design Technologies TSV • Through-silicon-via (TSV) based systems – Not mature

3 -D Integration as a Circuit Design Paradigm • (Re-)Design and assess spi. NNaker-based

3 -D Integration as a System Integration Approach • Heterogeneous 3 -D integration –

Asynchronous Logic Design Tools [Doug Edwards, ] Jim Garside, Steve Furber 31

Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete

GAELS • Globally Asynchronous Elastic Logic Synthesis – modern So. Cs comprise numerous, semi-autonomous

Reconfigurable Processing Dirk Koch Jim Garside 34

Current Computing • Energy use and design productivity are today's major concerns! • Software

FPGAs State of the Art • Modern FPGAs provide (e. g. XC 6 VSX

Change Hardware at Runtime Example: Database Acceleration compose FPGA processing pipelines by stitching together

Research • Methodologies and design tools • Applications (video, database, embedded) • Goal: allow

Mobile Systems Architecture Nick Filer with help from Barry Cheetham 39

Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, •

Current Interests • Support for adaptable network stacks • Mobile pocket networks • Low

DYVERSE Hybrid Dynamical Systems Eva Navarro López 42

Slides: 44

Download presentation

Advanced Processor Technologies group overview 1

APT group • Mission: – Moore’s Law will soon deliver billion transistor chips – how do we make best use of a billion transistors? • parallel processing • systems-on-chip • novel architectures • …? 2

Strategy/Vision • Integrating focus on many-core systems – hardware, architecture, run-time systems, compilation, programming languages – general-purpose and special-purpose – homogeneous and heterogeneous 3

Strategy/Vision • Underpinning technology themes: – energy-efficiency – reliability & fault-tolerance – FPGAs & reconfigurability – silicon design – 3 D packaging and modelling – many-core interconnect – neural systems engineering 4

Major Funding • ERC Advanced Grant: Biologically- Inspired Massively Parallel Computing • EPSRC Programme Grant: PAMELA • EU ICT Flagship: Human Brain Project • Plus… – BABEL, Brain. Scale. S, PRi. ME, DOME, GAELS, INPUT, AXLE, Teraflux, Any. Scale Apps … 5

Many-core Architecture and Software Mikel Lujan Antoniu Pop 6

Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications, programming languages, compilers runtime systems (OS), computer architecture] 7

Active projects • Many cores systems – Teraflux – parallel computational model • Novel programming language • Novel many-core memory organization – Focus on hardware/software codesign Managed Runtime Environments and Low-Power Many-core Architectures • DOME Delaying and Overcoming Microprocessor Errors • PAMELA – Computer Vision & Data Centers • Any. Scale Apps – Approximate Computing • Big Data – AXLE • Accelerating Analytics of Big Data – RETHINK. big EU Roadmap for Big Data 8

Open. Stream Project • Make data flow programming easy – Open. MP-style annotations on C code – Visualization tool to debug performance issues: OSTV • Efficient execution on Many-Cores – Code generation for existing (x 86, ARM) and experimental architectures (Teraflux) – Optimizations (e. g. , polyhedral compilation) – Runtime algorithms optimized for weak memory consistency models 9

Managed Runtime Enviroments • Java, . Net are examples of managed runtime environments (JVM, CLR) • Key elements: JIT compilation and control of memory allocation • Research opportunities: – Scaling MREs for many-core architectures (GPUs) – Hardware acceleration of MREs – Use MREs for low-power computing – Use MREs for dealing with faults and transistor wearout -> DOME 10

Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer architects build software models to simulate new architectures • Simulation can be slow (months to run one application) • How we can accelerate this process? Research opportunities – New modelling techniques – FPGA prototyping 11

AXLE & Big Data • Collaboration with Dr. Javier Navaridas & Dr. Gavin Brown (MLO group) • Amount of data generated in scientific experiments or social web keeps growing! • Graph-based data -> complex computation • How can we make sense of this data deluge? – New Learning techniques capable of working at scale – Redesign architectures (clusters/data centres) and software for low power analytics – Accelerate software (JIT adaptation) for data processing – Hardware acceleration for low-power learning algorithms 12

Communication Architectures Javier Navaridas 13

Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance computing networks – Massively Parallel Processing systems – Compute (Super)Clusters • Datacentre networks – Off-the-shelf equipment – High performance alternatives • Performance Metrics – Throughput, Latency – Power, Area – Fault tolerance – Applications running time 14

Topics of interest • Topologies – Routing – Wiring – Fault resilience – Deadlock avoidance • Router microarchitecture – Congestion control – Quality of Service – Fault tolerance • Scheduling and resource management – Task placement • System and workload modelling – Analytical modelling – Simulation 15

INPUT Project • In collaboration with Durham University • Investigate how practical and theoretical aspects can be put together to improve efficiency and performance • Main research questions – Can we design interconnection networks that are incrementally expandable? – Can distance properties be more accurately ascertained? – Can minimal routing algorithms be developed? – Can we embed theoretical properties into the router architecture? – Can we reflect components behaviour into theoretical analyses? – How well do theoretically advantageous networks perform under realistic conditions? – Can we describe specific traffic patterns arising from applications using graph theory? – Can we transform traffic pattern graphs so that they are ‘embeddable’ into a network? 16

High Performance Computing Graham Riley 17

Graham Riley • Interests: – Parallel Performance Analysis and Improvement • Techniques and methods • Scientific applications – Numerical algorithms and implementations – Flexible software coupling technologies • Flexible construction and deployment of complex multi-model software – The ‘Exascale’ challenge • Scalable software and many-core hardware • Good links to Weather and Climate modellers – UK Met Office, European and US Centres 18

Current projects • NERC ‘Gung. Ho’ (NERC) – Developing a new, highly scalable, dynamical core for the Met Office’s atmosphere model • IS-ENES (EU FP 7) – Scalable software infrastructures for Earth System Modelling • ERMITAGE (EU FP 7) – Coupling technology for Integrated Assessment modelling • Climate impact and mitigation • PAMELA (EPSRC programme grant) – Mobile vision scene understanding application – From algorithms to specialized hardware via compiler and run-time systems 19

Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20

Spi. NNaker project • A million mobile phone processors in one computer • Able to model about 1% of the human brain… • …or 10 mice! 21

Spi. NNaker chip Multi-chip packaging by UNISEM Europe 22

Spi. NNaker circuit boards 23

Spi. NNaker applications • A wide range of global collaborators • Annual workshops: – Capo Caccia – Telluride 24

Ph. D projects • Recent: – Spi. NNaker monitoring – Py. NN -> Spi. NNaker – Real-time neural learning algorithms – Modelling the rat barrel cortex – Technology scaling on Spi. NNaker • Future: – System software • run-time fault-tolerance, scaling, … – Spi. NNaker 2 architecture exploration – Neural network models • learning algorithms, rewiring – Robotics using Spi. NNaker – Non-neural algorithms • graphics, physics modelling, … 25

3 -D Integrated Circuits & Systems Vasilis Pavlidis pavlidis@cs. man. ac. uk www. cs. man. ac. uk/~pavlidiv 26

3 -D Integration Benefits 2 -D global wire of 20 mm 3 -D global wire of 12 mm • Integrate disparate • The same total area for the two circuits • Delay improvement for 3 -D up to 54%* • Architectural and physical design implications leading to several research questions * “ASU Predictive Technology Model. ” [Online]. Available: http: //www. eas. asu. edu/~ptm/ technologies/components 27 27

3 -D Integration Design Technologies TSV • Through-silicon-via (TSV) based systems – Not mature for high volume manufacturing (HVM) • Silicon interposers – HVM from Xilinx FPGAs – Glass interposers are explored Xilinx FPGA Virtex 7 • 3 -D technologies and tools for prototyping are available 28 28

3 -D Integration as a Circuit Design Paradigm • (Re-)Design and assess spi. NNaker-based 3 -D architectures – Power, area, and performance tradeoffs – Interposer and TSV technologies • Research methodology – Use available resources – Differentiate only where required • Reorganize spi. NNaker system at the chip level – Replace wire bonds with TSVs • Reorganise spi. NNaker system at the core level – On-chip long wires 29 replaced with short TSVs 29

3 -D Integration as a System Integration Approach • Heterogeneous 3 -D integration – Preached a lot but hardly explored! • Develop techniques and methods for “Mix-and. Match” systems – How do you model…? – How do evaluate…? – How do you integrate…? – How do you manufacture…? • The physical proximity of diverse systems may not come for free! § Interdisciplinary research is a prerequisite for such systems § Rather application driven 30 30

Asynchronous Logic Design Tools [Doug Edwards, ] Jim Garside, Steve Furber 31

Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete microprocessors • SEDATE – delay Insensitive datapath synthesis • GALSA – framework for heterogeneous GALS • . . . 32

GAELS • Globally Asynchronous Elastic Logic Synthesis – modern So. Cs comprise numerous, semi-autonomous subsystems – shrinking transistors have hard-topredict variations • Address using Elastic Logic – new, delay tolerant paradigm – new project! 33

Reconfigurable Processing Dirk Koch Jim Garside 34

Current Computing • Energy use and design productivity are today's major concerns! • Software – offers very good programmability – But: highly inefficient • Hardware – limited programmability – greater efficiency – But: expensive to develop • FPGAs: take the best from both 35

FPGAs State of the Art • Modern FPGAs provide (e. g. XC 6 VSX 475 T) • 1000 32 -bit multipliers • 500 MHz clock speed • 4. 8 MB on-chip memory @ 5 TB/s (aggregated) • Less than 30 Watts power • Allow very customized hardware (do more in less cycles) High-performance and low-power • But: difficult to program (Verilog/VHDL) • Compilers for C and Java, etc. are at the horizon 36

Change Hardware at Runtime Example: Database Acceleration compose FPGA processing pipelines by stitching together SQL modules Design goals: • 512 bit datapath • 300+x MHz (Virtex-6) • Dozens of concurrently working accelerators • 100 x faster than X 86 for some queries 37

Research • Methodologies and design tools • Applications (video, database, embedded) • Goal: allow “civilians” to program FPGAs (and to use dynamic reconfiguration) 38

Mobile Systems Architecture Nick Filer with help from Barry Cheetham 39

Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, • Voice over IP, • Sensors/Things (data collection), protocols. . . • Pocket networks (e. g. mobile phones, PDAs), . . . • Information dissemination. Big data over networks. – Supported by: • Simulation, analysis, software generation tools. 40

Current Interests • Support for adaptable network stacks • Mobile pocket networks • Low power wireless sensor networks • Neighbour detection and handover in mobile wireless networks 41

DYVERSE Hybrid Dynamical Systems Eva Navarro López 42