Other Architectures Examples Multithreaded architectures Dataflow architectures Multiprocessor
- Slides: 23
Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006 Anshul Kumar, CSE IITD
Context switching • Delays and poor resource utilization due to – Data/control hazards – cache misses – waiting for some event • Solution – – context switch to another thread • Context switch mechanism – – operating system - slow – hardware - fast Anshul Kumar, CSE IITD
Multithreaded architecture • Hardware context switching • Models – control flow or hybrid (control flow, data flow) • Granularity – fine grain or coarse grain • Memory organization – shared? , distributed? , cache coherent? • No. of threads – small, medium, large Anshul Kumar, CSE IITD
ILP and Multithreading Coarse MT Hennessy and Patterson ILP Anshul Kumar, CSE IITD Fine MT SMT
Wikipedia Chip level multithreading Executing instructions from multiple threads within one processor chip at the same time. • Multithreading: Interleaved issue of multiple instructions from different threads • Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle. • Chip-level multiprocessing (CMP or Multicore): integrate two or more superscalar processors into one chip, each execute one thread independently • Any combination of multithreading/SMT/CMP Anshul Kumar, CSE IITD
Historical Examples Machine Granu. Year larity HEP from fine 1978 Denelcor centralized Procs Threads/ Memory proc max 168 active shared Tera max 256 fine distributed shared Anshul Kumar, CSE IITD 1990 64 max 128
Modern examples • • Pentium 4 MIPS MT IBM Power 5 Ultrasparc T 1 Anshul Kumar, CSE IITD Hyperthreading 8 cores with 4 threads each dual core, 2 threads each fine grained multithreading
HEP Control loop 8 stage pipeline scheduler function unit PSW queue Matching unit Program memory Increment control Operand fetch Registers SFU FU 1 FU 2 Anshul Kumar, CSE IITD FUn To/from data memory
Control Flow & Data Flow models • Control Flow (von Neumann) – control flows through a sequence of instructions, branches can alter the flow – instructions get data from or put data in memory – explicit parallelism through control operators – fork/join • Data Flow – instructions are triggered by availability of data – data flows from instruction to instruction – explicit parallelism Anshul Kumar, CSE IITD
Dataflow Model A B - 1 + A-B B+1 * R=(A-B)*(B+1) Anshul Kumar, CSE IITD
Dataflow Program L 1: A L 2: L 4/1 A-B - Compute B L 2/2 L 3/1 B B L 4: * L 3: + 1 L 4/2 B+1 L 6/1 R=(A-B)*(B+1) Anshul Kumar, CSE IITD
Static Dataflow Architecture Fetch unit Instruction queue FU 1 FU 2 FUn Update unit to/from other PEs Anshul Kumar, CSE IITD Activity Store
Tagged-token dataflow architecture Matching unit Matching store Fetch unit Token queue FU 1 FU 2 Instruction/ data memory FUn Form token unit to/from other PEs Anshul Kumar, CSE IITD
UMA Examples • Earlier approach : Large number of processors (e. g. Denelcor HEP, NYU Ultracomputer) • Now realized : Good only for small number of processors (e. g. Encore Multimax 1980’s, SGI Power Challenge - 1990’s) Anshul Kumar, CSE IITD
SGI Power Challenge • • 18 MIPS R 8000 16 GB RAM, 8 -way interleaved 4 power channel-2, each 320 MB/s (I/O bus) Power path-2 : split transaction shared bus (256 bit data, 40 bit address) • Snoopy cache coherence protocol Anshul Kumar, CSE IITD
NUMA Examples • • BBN TC 2000 IBM RP 3 Hector Cray T 3 D Anshul Kumar, CSE IITD
Hector • Hierarchical Structure global ring local rings stations Proc module (P+C+M) I/O module Anshul Kumar, CSE IITD
Hector station local ring global ring local ring station Station bus Proc module Anshul Kumar, CSE IITD Proc module I/O module station Station controller
Cray T 3 D • • Alpha 21064 Proc Cray Y-MP host upto 128 GB memory 4 x 4 x 4 3 D torus - config upto 8 x 8 x 8 2 PEs in each node Anshul Kumar, CSE IITD
CC-NUMA examples Machine Nodes Mem Cache Wisconsin Multicube Aquarius Multimulti Stanford Dash single proc per col bus snoopy bus grid single proc per node snoopy+ directory Stanford Flash Convex Exemplar cluster per cluster 4 R 3000+ FPU on bus single proc per node T 5+magic chip hyper node per 8 PA-RISC hyper node directory SCI Magic chip : memory + I/O + network controller Anshul Kumar, CSE IITD Net bus grid pair of meshes 2 D mesh X bar (hyper node) multi rings
COMA examples • DDM (Data Diffusion Machine) – single bus (split transaction) – can be made hierarchical • KSR 1 – hierarchical rings – distributed directory is a matrix : rows for pages, columns for caches Anshul Kumar, CSE IITD
Distr Mem Arch Examples Machine Comp. Topology n. CUBE 2 i. PSC 2 Intel yes Paragon Genesis custom Manna custom i 386 yes Comm. Vec. Switch proc hyper cube i 860 custom i 870 hyper cube i 860 2 D mesh i 870 2 level X bar i 860 16 x 16 X bar hierarch. Parsytec P. PC 601 T 805 C 004 3 D mesh Transtech i 860 T 805 Anshul Kumar, CSE IITD C 004 variable proc
References • D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997. Anshul Kumar, CSE IITD
- Multithreaded languages
- Real time example of multithreading in java
- Multithreaded games
- Multithreaded algorithms
- Apt multithreaded
- Data flow modeling in verilog
- Timely dataflow
- Dr suman jana
- Arbicot
- Verilog hdl
- Characteristics of multiprocessor
- Tuliskan tugas program slave
- Interconnection structure of multiprocessor
- Real time operating system
- Multiprocessor synchronization
- Multiprocessor memory contention
- Tightly coupled multiprocessor
- Multiprocessor vs multicore
- Multicore programming
- Multiprocessor and multicomputer
- Mqms
- The art of multiprocessor programming exercise solutions
- Multiprocessing operating system
- Multiprocessor network topologies