Advanced Processor Technologies group overview 1 APT group

  • Slides: 44
Download presentation
Advanced Processor Technologies group overview 1

Advanced Processor Technologies group overview 1

APT group • Mission: – Moore’s Law will soon deliver billion transistor chips –

APT group • Mission: – Moore’s Law will soon deliver billion transistor chips – how do we make best use of a billion transistors? • parallel processing • systems-on-chip • novel architectures • …? 2

Strategy/Vision • Integrating focus on many-core systems – hardware, architecture, run-time systems, compilation, programming

Strategy/Vision • Integrating focus on many-core systems – hardware, architecture, run-time systems, compilation, programming languages – general-purpose and special-purpose – homogeneous and heterogeneous 3

Strategy/Vision • Underpinning technology themes: – energy-efficiency – reliability & fault-tolerance – FPGAs &

Strategy/Vision • Underpinning technology themes: – energy-efficiency – reliability & fault-tolerance – FPGAs & reconfigurability – silicon design – 3 D packaging and modelling – many-core interconnect – neural systems engineering 4

Major Funding • ERC Advanced Grant: Biologically- Inspired Massively Parallel Computing • EPSRC Programme

Major Funding • ERC Advanced Grant: Biologically- Inspired Massively Parallel Computing • EPSRC Programme Grant: PAMELA • EU ICT Flagship: Human Brain Project • Plus… – BABEL, Brain. Scale. S, PRi. ME, DOME, GAELS, INPUT, AXLE, Teraflux, Any. Scale Apps … 5

Many-core Architecture and Software Mikel Lujan Antoniu Pop 6

Many-core Architecture and Software Mikel Lujan Antoniu Pop 6

Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications,

Buying a single-core processor is difficult! Multi-cores bring fundamental changes for Computer Science [applications, programming languages, compilers runtime systems (OS), computer architecture] 7

Active projects • Many cores systems – Teraflux – parallel computational model • Novel

Active projects • Many cores systems – Teraflux – parallel computational model • Novel programming language • Novel many-core memory organization – Focus on hardware/software codesign Managed Runtime Environments and Low-Power Many-core Architectures • DOME Delaying and Overcoming Microprocessor Errors • PAMELA – Computer Vision & Data Centers • Any. Scale Apps – Approximate Computing • Big Data – AXLE • Accelerating Analytics of Big Data – RETHINK. big EU Roadmap for Big Data 8

Open. Stream Project • Make data flow programming easy – Open. MP-style annotations on

Open. Stream Project • Make data flow programming easy – Open. MP-style annotations on C code – Visualization tool to debug performance issues: OSTV • Efficient execution on Many-Cores – Code generation for existing (x 86, ARM) and experimental architectures (Teraflux) – Optimizations (e. g. , polyhedral compilation) – Runtime algorithms optimized for weak memory consistency models 9

Managed Runtime Enviroments • Java, . Net are examples of managed runtime environments (JVM,

Managed Runtime Enviroments • Java, . Net are examples of managed runtime environments (JVM, CLR) • Key elements: JIT compilation and control of memory allocation • Research opportunities: – Scaling MREs for many-core architectures (GPUs) – Hardware acceleration of MREs – Use MREs for low-power computing – Use MREs for dealing with faults and transistor wearout -> DOME 10

Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer

Simulate/Prototype many-core architectures • Designing a chip is expensive and time consuming • Computer architects build software models to simulate new architectures • Simulation can be slow (months to run one application) • How we can accelerate this process? Research opportunities – New modelling techniques – FPGA prototyping 11

AXLE & Big Data • Collaboration with Dr. Javier Navaridas & Dr. Gavin Brown

AXLE & Big Data • Collaboration with Dr. Javier Navaridas & Dr. Gavin Brown (MLO group) • Amount of data generated in scientific experiments or social web keeps growing! • Graph-based data -> complex computation • How can we make sense of this data deluge? – New Learning techniques capable of working at scale – Redesign architectures (clusters/data centres) and software for low power analytics – Accelerate software (JIT adaptation) for data processing – Hardware acceleration for low-power learning algorithms 12

Communication Architectures Javier Navaridas 13

Communication Architectures Javier Navaridas 13

Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance

Interconnection Networks • On-chip networks – Tile-based systems – Heterogeneous systems • High performance computing networks – Massively Parallel Processing systems – Compute (Super)Clusters • Datacentre networks – Off-the-shelf equipment – High performance alternatives • Performance Metrics – Throughput, Latency – Power, Area – Fault tolerance – Applications running time 14

Topics of interest • Topologies – Routing – Wiring – Fault resilience – Deadlock

Topics of interest • Topologies – Routing – Wiring – Fault resilience – Deadlock avoidance • Router microarchitecture – Congestion control – Quality of Service – Fault tolerance • Scheduling and resource management – Task placement • System and workload modelling – Analytical modelling – Simulation 15

INPUT Project • In collaboration with Durham University • Investigate how practical and theoretical

INPUT Project • In collaboration with Durham University • Investigate how practical and theoretical aspects can be put together to improve efficiency and performance • Main research questions – Can we design interconnection networks that are incrementally expandable? – Can distance properties be more accurately ascertained? – Can minimal routing algorithms be developed? – Can we embed theoretical properties into the router architecture? – Can we reflect components behaviour into theoretical analyses? – How well do theoretically advantageous networks perform under realistic conditions? – Can we describe specific traffic patterns arising from applications using graph theory? – Can we transform traffic pattern graphs so that they are ‘embeddable’ into a network? 16

High Performance Computing Graham Riley 17

High Performance Computing Graham Riley 17

Graham Riley • Interests: – Parallel Performance Analysis and Improvement • Techniques and methods

Graham Riley • Interests: – Parallel Performance Analysis and Improvement • Techniques and methods • Scientific applications – Numerical algorithms and implementations – Flexible software coupling technologies • Flexible construction and deployment of complex multi-model software – The ‘Exascale’ challenge • Scalable software and many-core hardware • Good links to Weather and Climate modellers – UK Met Office, European and US Centres 18

Current projects • NERC ‘Gung. Ho’ (NERC) – Developing a new, highly scalable, dynamical

Current projects • NERC ‘Gung. Ho’ (NERC) – Developing a new, highly scalable, dynamical core for the Met Office’s atmosphere model • IS-ENES (EU FP 7) – Scalable software infrastructures for Earth System Modelling • ERMITAGE (EU FP 7) – Coupling technology for Integrated Assessment modelling • Climate impact and mitigation • PAMELA (EPSRC programme grant) – Mobile vision scene understanding application – From algorithms to specialized hardware via compiler and run-time systems 19

Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20

Neural Systems Engineering Steve Furber, Jim Garside, Dave Lester 20

Spi. NNaker project • A million mobile phone processors in one computer • Able

Spi. NNaker project • A million mobile phone processors in one computer • Able to model about 1% of the human brain… • …or 10 mice! 21

Spi. NNaker chip Multi-chip packaging by UNISEM Europe 22

Spi. NNaker chip Multi-chip packaging by UNISEM Europe 22

Spi. NNaker circuit boards 23

Spi. NNaker circuit boards 23

Spi. NNaker applications • A wide range of global collaborators • Annual workshops: –

Spi. NNaker applications • A wide range of global collaborators • Annual workshops: – Capo Caccia – Telluride 24

Ph. D projects • Recent: – Spi. NNaker monitoring – Py. NN -> Spi.

Ph. D projects • Recent: – Spi. NNaker monitoring – Py. NN -> Spi. NNaker – Real-time neural learning algorithms – Modelling the rat barrel cortex – Technology scaling on Spi. NNaker • Future: – System software • run-time fault-tolerance, scaling, … – Spi. NNaker 2 architecture exploration – Neural network models • learning algorithms, rewiring – Robotics using Spi. NNaker – Non-neural algorithms • graphics, physics modelling, … 25

3 -D Integrated Circuits & Systems Vasilis Pavlidis pavlidis@cs. man. ac. uk www. cs.

3 -D Integrated Circuits & Systems Vasilis Pavlidis pavlidis@cs. man. ac. uk www. cs. man. ac. uk/~pavlidiv 26

3 -D Integration Benefits 2 -D global wire of 20 mm 3 -D global

3 -D Integration Benefits 2 -D global wire of 20 mm 3 -D global wire of 12 mm • Integrate disparate • The same total area for the two circuits • Delay improvement for 3 -D up to 54%* • Architectural and physical design implications leading to several research questions * “ASU Predictive Technology Model. ” [Online]. Available: http: //www. eas. asu. edu/~ptm/ technologies/components 27 27

3 -D Integration Design Technologies TSV • Through-silicon-via (TSV) based systems – Not mature

3 -D Integration Design Technologies TSV • Through-silicon-via (TSV) based systems – Not mature for high volume manufacturing (HVM) • Silicon interposers – HVM from Xilinx FPGAs – Glass interposers are explored Xilinx FPGA Virtex 7 • 3 -D technologies and tools for prototyping are available 28 28

3 -D Integration as a Circuit Design Paradigm • (Re-)Design and assess spi. NNaker-based

3 -D Integration as a Circuit Design Paradigm • (Re-)Design and assess spi. NNaker-based 3 -D architectures – Power, area, and performance tradeoffs – Interposer and TSV technologies • Research methodology – Use available resources – Differentiate only where required • Reorganize spi. NNaker system at the chip level – Replace wire bonds with TSVs • Reorganise spi. NNaker system at the core level – On-chip long wires 29 replaced with short TSVs 29

3 -D Integration as a System Integration Approach • Heterogeneous 3 -D integration –

3 -D Integration as a System Integration Approach • Heterogeneous 3 -D integration – Preached a lot but hardly explored! • Develop techniques and methods for “Mix-and. Match” systems – How do you model…? – How do evaluate…? – How do you integrate…? – How do you manufacture…? • The physical proximity of diverse systems may not come for free! § Interdisciplinary research is a prerequisite for such systems § Rather application driven 30 30

Asynchronous Logic Design Tools [Doug Edwards, ] Jim Garside, Steve Furber 31

Asynchronous Logic Design Tools [Doug Edwards, ] Jim Garside, Steve Furber 31

Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete

Previous Projects • Balsa – world-leading public asynchronous synthesis tool – used for complete microprocessors • SEDATE – delay Insensitive datapath synthesis • GALSA – framework for heterogeneous GALS • . . . 32

GAELS • Globally Asynchronous Elastic Logic Synthesis – modern So. Cs comprise numerous, semi-autonomous

GAELS • Globally Asynchronous Elastic Logic Synthesis – modern So. Cs comprise numerous, semi-autonomous subsystems – shrinking transistors have hard-topredict variations • Address using Elastic Logic – new, delay tolerant paradigm – new project! 33

Reconfigurable Processing Dirk Koch Jim Garside 34

Reconfigurable Processing Dirk Koch Jim Garside 34

Current Computing • Energy use and design productivity are today's major concerns! • Software

Current Computing • Energy use and design productivity are today's major concerns! • Software – offers very good programmability – But: highly inefficient • Hardware – limited programmability – greater efficiency – But: expensive to develop • FPGAs: take the best from both 35

FPGAs State of the Art • Modern FPGAs provide (e. g. XC 6 VSX

FPGAs State of the Art • Modern FPGAs provide (e. g. XC 6 VSX 475 T) • 1000 32 -bit multipliers • 500 MHz clock speed • 4. 8 MB on-chip memory @ 5 TB/s (aggregated) • Less than 30 Watts power • Allow very customized hardware (do more in less cycles) High-performance and low-power • But: difficult to program (Verilog/VHDL) • Compilers for C and Java, etc. are at the horizon 36

Change Hardware at Runtime Example: Database Acceleration compose FPGA processing pipelines by stitching together

Change Hardware at Runtime Example: Database Acceleration compose FPGA processing pipelines by stitching together SQL modules Design goals: • 512 bit datapath • 300+x MHz (Virtex-6) • Dozens of concurrently working accelerators • 100 x faster than X 86 for some queries 37

Research • Methodologies and design tools • Applications (video, database, embedded) • Goal: allow

Research • Methodologies and design tools • Applications (video, database, embedded) • Goal: allow “civilians” to program FPGAs (and to use dynamic reconfiguration) 38

Mobile Systems Architecture Nick Filer with help from Barry Cheetham 39

Mobile Systems Architecture Nick Filer with help from Barry Cheetham 39

Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, •

Nick Filer • Interests: – Wireless networks of all types. Mainly: • Ad-hoc, • Voice over IP, • Sensors/Things (data collection), protocols. . . • Pocket networks (e. g. mobile phones, PDAs), . . . • Information dissemination. Big data over networks. – Supported by: • Simulation, analysis, software generation tools. 40

Current Interests • Support for adaptable network stacks • Mobile pocket networks • Low

Current Interests • Support for adaptable network stacks • Mobile pocket networks • Low power wireless sensor networks • Neighbour detection and handover in mobile wireless networks 41

DYVERSE Hybrid Dynamical Systems Eva Navarro López 42

DYVERSE Hybrid Dynamical Systems Eva Navarro López 42

43

43

44

44