Singlechip Cloud ComputerSCC An experimental manycore processor from

  • Slides: 36
Download presentation
“Single-chip Cloud Computer(SCC)” An experimental many-core processor from Intel Labs You Liang Director of

“Single-chip Cloud Computer(SCC)” An experimental many-core processor from Intel Labs You Liang Director of Technical Support Intel China Center of Parallel Computing Date, 2011 Intel Confidential

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software framework for SCC Platform • ICCPC supporting structure for MARC China Community

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software framework for SCC Platform • ICCPC supporting structure for MARC China Community

What is Tera-scale? Entertainment Performance TIPS Learning & Travel Source: electronic visualization lab University

What is Tera-scale? Entertainment Performance TIPS Learning & Travel Source: electronic visualization lab University of Illinois RMS Personal Media Creation and Management GIPS 3 D & Video MIPS KIPS Mult. Media Text Tera-scale Multi-core Single-core Health Kilobytes Megabytes Gigabytes Dataset Size http: //techresearch. intel. com/articles/Tera-Scale/1421. htm Terabytes

Performance Scaling Challenges Energy Efficiency Design Complexity Programming Models Emerging Applications

Performance Scaling Challenges Energy Efficiency Design Complexity Programming Models Emerging Applications

Teraflops Research Processor 12. 64 mm I/O Area single tile 1. 5 mm 2.

Teraflops Research Processor 12. 64 mm I/O Area single tile 1. 5 mm 2. 0 mm Goals: • Deliver Tera-scale performance – Single precision TFLOP at desktop power – Frequency target 5 GHz – Bi-section B/W order of Terabits/s – Link bandwidth in hundreds of GB/s 21. 72 mm • Prototype two key technologies – On-die interconnect fabric – 3 D stacked memory • Develop a scalable design methodology – Tiled design approach – Power-aware capability PLL I/O Area

Single-chip Cloud Computer Experimental Processor 3. 6 mm MPB Core 0 Technology 45 nm

Single-chip Cloud Computer Experimental Processor 3. 6 mm MPB Core 0 Technology 45 nm Hi-K CMOS Interconnect 9 Metal (Cu) Transistors Die: 1. 3 B, Tile: 48 M Tile Area 18. 7 mm 2 Die Area 567. 1 mm 2 DDR 3 MC PLL L 2$0 DDR 3 MC TILE VRC JTAG TILE 21. 4 mm Router Core 1 DDR 3 MC 5. 2 mm L 2$1 DDR 3 MC 26. 5 mm System Interface + I/O Howard, J, et al. , “A 48 -Core IA-32 Message-Passing Processor with DVFS in 45 nm CMOS”, in Proceedings of ISSCC 2010 (IEEE International Solid-State Circuits Conference), Feb. 2010

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software framework for SCC Platform • ICCPC supporting structure for MARC China Community

SCC Feature set • First Si with 48 i. A cores on a single

SCC Feature set • First Si with 48 i. A cores on a single die • • Power envelope 125 W Core @1 GHz, Mesh @2 GHz Message passing architecture • No coherent shared memory • Proof of Concept for scalable solution for many-core • Next generation 2 D mesh interconnect • Bisection B/W 1. 5 Tb/s to 2 Tb/s, avg. power 6 W to 12 W • Fine grain dynamic power management • Off-die VRs

MC 0 MC 2 MC 1 MC 3 Die Architecture VRC System Interface Router

MC 0 MC 2 MC 1 MC 3 Die Architecture VRC System Interface Router L 2$1 IA-32 256 KB Core 1 Router MPB 16 KB L 2$0 IA-32 256 KB Core 0 Tile 2 core clusters in 6 x 4 2 -D mesh 16 B

Core Memory Management • Core cache coherency is restricted to private memory space –

Core Memory Management • Core cache coherency is restricted to private memory space – Maintaining cache coherency for shared memory space is under software control CORE 0 LUT Example Boot Shared 1 GB Private 1 0 Maps to MC 0 Maps to VRC Maps to MPBs … 255 254 … • Each core has an address Look Up Table (LUT) extension • LUT must fit within the core and memory controller constraints • LUT boundaries are dynamically programmed Maps to MC 0

Message Passing on SCC • Message passing is done through shared memory space •

Message Passing on SCC • Message passing is done through shared memory space • Two classes of shared memory: – Off-die, DRAM: Uncachable shared memory … results in high latency message passing – On-die, message passing buffers (MPB) … low latency message passing • On-die dedicated message buffers placed in each tile to improve message passing performance • Message bandwidth improved to 1 GB/s

Voltage and Frequency islands 27 Frequency Islands (FI) 8 Voltage Islands (VI)

Voltage and Frequency islands 27 Frequency Islands (FI) 8 Voltage Islands (VI)

Core & Router Frequency

Core & Router Frequency

Power breakdown

Power breakdown

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software framework for SCC Platform • ICCPC supporting structure for MARC China Community

SCC system overview PLL JTAG I/O JTAG BUS SCC die RPC System FPGA PCIe

SCC system overview PLL JTAG I/O JTAG BUS SCC die RPC System FPGA PCIe Management Console PC

SCC Platform Board Overview

SCC Platform Board Overview

SCC Linux Build The scc. Kit comes with a custom Linux build which can

SCC Linux Build The scc. Kit comes with a custom Linux build which can be used to execute own applications: Kernel 2. 6. 16 with Busybox 1. 15. 1 Booting w/o BIOS possible (Kernel mods) Dropbear ssh On-die TCP/IP network drivers Off-die TCP/IP driver for connection to management console including NFS service. • Drivers for low level access to SCC specific hardware (e. g. MPB). • SATA driver under development (latest Q 1/2011) • • •

SCC Linux Apps • Cross-compilers for Pentium processor compatible IA cores available (C++, Fortran)

SCC Linux Apps • Cross-compilers for Pentium processor compatible IA cores available (C++, Fortran) • Write own low level device drivers for deeper dives. • Cross compiled MPI 2 including ITAC (Intel Trace Analyzer and Collector) available.

Creating Management Console PC Apps • Code of scc. Gui as well as command

Creating Management Console PC Apps • Code of scc. Gui as well as command line tools is available as code example. These tools use and extend the low level API. io ct ra st ab re wa rd Ha • Low level API (scc. Api) with access to SCC and board management controller(BMC) via PCIe. n • Written in C++ making use of Nokia Qt cross-platform application and UI framework. scc. Api C++ class for low level access to Hardware. scc. Ext. Api Inherited C++ class with application specific features. Application Actual app (e. g. scc. Gui) that doesn‘t need to care for low level access. . .

scc. Gui • Read and write system memory and registers. • Boot OS or

scc. Gui • Read and write system memory and registers. • Boot OS or other workloads (e. g. bare. Metal. C). • Open SSH connections to booted Linux cores • Performance meter • Initialize Platform via Board Management Controller.

scc. Boot & scc. Reset • scc. Boot: A command-line tool that allows to

scc. Boot & scc. Reset • scc. Boot: A command-line tool that allows to boot Linux on selected cores and to check the status (“which cores are currently booted”). It also allows to boot generic workloads (e. g. bare. Metal. C applications) • scc. Reset: A command-line tool that allows to reset selected SCC cores.

scc. Konsole • Regular konsole, with automatic login to selected cores. • Enables broadcasting

scc. Konsole • Regular konsole, with automatic login to selected cores. • Enables broadcasting amongst shells. • No graphic support.

scc. Display 48 (!) virtual displays with full graphics support. . . • Graphical

scc. Display 48 (!) virtual displays with full graphics support. . . • Graphical konsoles with virtual graphics cards. • Forwards keyboard and mouse commands to cores. • Allows preview over all virtual framebuffers (photo).

The RCCE library • RCCE API provides the basic message passing functionality expected in

The RCCE library • RCCE API provides the basic message passing functionality expected in a tiny communication library: – One + two sided interface (put/get + send/recv) with synchronization flags and MPB management exposed. – The “gory” interface for programmers who need the most detailed control over SCC – Two sided interface (send/recv) with most detail (flags and MPB management) hidden. – The “basic” interface for typical application programmers. put() send() get() recv ()

RCCE Power Management API • RCCE power management emphasizes safe control: V/GHz changed together

RCCE Power Management API • RCCE power management emphasizes safe control: V/GHz changed together within each 4 tile (8 -core) power domain. – A Master core sets V + GHz for all cores in domain. • RCCE_istep_power(): – steps up or down V + GHz, where GHz is max for selected voltage. • RCCE_wait_power(): – returns when power change is done • RCCE_step_frequency(): – steps up or down only GHz • Power management latencies – V changes: Very high latency, O(Million) cycles. – GHz changes: Low latency, O(few) cycles.

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software

Agenda • Tera-scale Research Processor • SCC Architecture Overview • scc. Kit -- Software framework for SCC Platform • ICCPC supporting structure for MARC China Community

ICCPC introduction • Intel China Center of Parallel Computing Ø To provide the client

ICCPC introduction • Intel China Center of Parallel Computing Ø To provide the client with Intel multi/many-core technologies based technical support. Ø MARC China Community Program Operation Ø SCC China technical support Ø To provide the client with technical advice and solutions for multi/many-core and parallel computing. Ø To provide the client with the software development and services based on Intel multi/many-core technologies. Ø To provide Intel multi/many-core technologies based technical trainings and certification to the academia and industry.

Technical Supporting Team

Technical Supporting Team

SCC Data Center

SCC Data Center

2010. 12 SCC Seminar

2010. 12 SCC Seminar

2011. 7 MARC China Launch Event

2011. 7 MARC China Launch Event

MARC China Researchers

MARC China Researchers

MARC China Website http: //communities. intel. com/community/marc/chin a Welcome to join MARC China! service@iccpcdev.

MARC China Website http: //communities. intel. com/community/marc/chin a Welcome to join MARC China! service@iccpcdev. com leo. you@iccpcdev. com

Thank you!

Thank you!