MPP DSM Distributed Shared Memory Shared Virtual Memory












基于访存模式的体系结构分类 基本形式 共享存储:SMP 分布式存储: MPP,机群 分布式共享存储(DSM: Distributed Shared Memory) 共享虚存(SVM:Shared Virtual Memory) NUMA: Non-Uniform Memory Access cc. NUMA: Cache Coherent NUMA (一种DSM) SGI: Origin 3000, Altix 3000




多核芯片Multi-Core解决体积/功耗体积问题 AMD 双核芯片 Intel 4 核 芯片 HT CPU Cache HT: Hyper Transfer Cache 多核芯片功耗同单CPU芯片一样 多线程 Many multi-threads


– What is GPU Computing? – GPU computing is the use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing. The model for GPU computing is to use a CPU and GPU together in a heterogeneous computing model. The sequential part of the application runs on the CPU and the computationally-intensive part runs on the GPU. From the user’s perspective, the application just runs faster because it is using the high-performance of the GPU to boost performance. No. of cores >100 1. 0 Tera. Flpos/chip (single precision)

CELL 芯片结构框图 SIMD, 单精度>256 GFlops, 双精度>26 GFLOPS 协处理器/单元 SPU SPE SXU LS SMF EIB (96 bytes/clock) L 2 L 1 PPU PXU MIC Dual XDR BIC Flex I/O




SMP: Symmetric Multi-Processing CPU cache . . . CPU . . . cache Bus/cross bar/… I/O bus 内存 I/O设备

SMP: Symmetric Multi-Processing 缓存一致性(Cache coherence)问题 p 1 p 3 p 2 u: 5 u: ? u: 5 0, 初始, u=5 1, P 1读 u 2, p 3读 u 3, p 3写 7入u 4, p 1读 u, 得? 5, p 2读u, 得? Bus/cross bar/… I/O bus u: 5 I/O设备 内存

























node 曙光1000 通信网络 6 X 6 Wormhole Mesh WRC (与Cross bar 的区别? )





Wormhole Routing Chip (WRC) 设计 PYo NYi Hi PXi N i PXo NXi PYo NYi PXo WRC NXo PYi Hi X维 Ho Hi Y维 NYo Ho 分层结构 PXi NXo PYi NYo Ho NXi

Wormhole 数据通路设计(单层—X层) PXi L L PXo L L L: Latch C: Counter C Hi L L L Ho C NXi L C L 路由开关 NXo X-1或 X+1









参考书 1, Parallel Computer Architecture, David Culler, et al. 2, High Performance Cluster Computing(Vol. 1), Rajkumar Buyya
- Slides: 64