ARSITEKTUR KOMPUTER PARALEL 1 Apakah Arsitektur Parallel Komputer

  • Slides: 42
Download presentation
ARSITEKTUR KOMPUTER PARALEL 1

ARSITEKTUR KOMPUTER PARALEL 1

Apakah Arsitektur Parallel? • Komputer (dengan arsitektur) paralel adalah sekumpulan elemen pemroses (Processing Elements)

Apakah Arsitektur Parallel? • Komputer (dengan arsitektur) paralel adalah sekumpulan elemen pemroses (Processing Elements) yang bekerjasama dalam menyelesaikan sebuah masalah besar. Mengapa perlu mengenal arsitektur paralel ? • • • Tuntutan aplikasi Trend Teknolog Trend Arsitekture Ekonomi Trend saat ini : – Kebanyakan mikroprosesor sekarang ini mempunyai fasilitas untuk mendukung multiprosesor. – Server dan workstation berarsitektur multiprosesor : Sun, SGI, DEC, COMPAQ!. . . – Mikroprosesor yad (dan sekarang) adalah multiprosesor 2

Unsur unsur yang berpengaruh terhadap perkembangan arsitektur komputer Technology Programming Languages Applications Computer Architecture

Unsur unsur yang berpengaruh terhadap perkembangan arsitektur komputer Technology Programming Languages Applications Computer Architecture Operating Systems History 3

4

4

Pengolahan Paralel Pada arsitektur dengan prosesor tunggal Pada arsitektur dengan banyak-prosesor 5

Pengolahan Paralel Pada arsitektur dengan prosesor tunggal Pada arsitektur dengan banyak-prosesor 5

Klasifikasi Arsitektur Komputer (berdasarkan Flynn 1972) Single Instruction-stream, Single Data-stream prosesor ALU 6

Klasifikasi Arsitektur Komputer (berdasarkan Flynn 1972) Single Instruction-stream, Single Data-stream prosesor ALU 6

Single Instruction-stream, Multiple Data-stream DS Instruksi (program) di-load oleh host Data diload oleh host

Single Instruction-stream, Multiple Data-stream DS Instruksi (program) di-load oleh host Data diload oleh host aliran instruksi yg sama 7

Multiple Instruction-stream, Multiple Data-stream 8

Multiple Instruction-stream, Multiple Data-stream 8

Multiple Instruction-stream, Single Data-stream 9

Multiple Instruction-stream, Single Data-stream 9

Many 1 Instruction Streams MISD MIMD SISD SIMD 1 Many Data Streams 10

Many 1 Instruction Streams MISD MIMD SISD SIMD 1 Many Data Streams 10

Kategori Komputer Prosesor tunggal - SISD : Komputer Von Neuman (komputer skalar) - SIMD

Kategori Komputer Prosesor tunggal - SISD : Komputer Von Neuman (komputer skalar) - SIMD : Komputer Array, Komputer Vektor Prosesor Paralel - MIMD : - Komputer Paralel (1) Multiprosesor tipe Shared-memory (2) Multiprosesor tipe Message Passing* Spesial Purposed Computer - Komputer terdistribusi 11

SCALAR (1 operation) r 2 r 1 VECTOR (N operations) v 1 v 2

SCALAR (1 operation) r 2 r 1 VECTOR (N operations) v 1 v 2 + + r 3 v 3 add r 3, r 1, r 2 vector length vadd. vv v 3, v 1, v 2 Aplikasi image processing (signal processing), multi media 12

Instruksi Vector dasar Instr. Operands Operation Comment VADD. VV V 1, V 2, V

Instruksi Vector dasar Instr. Operands Operation Comment VADD. VV V 1, V 2, V 3 V 1=V 2+V 3 vector + vector VADD. SV V 1, R 0, V 2 V 1=R 0+V 2 scalar + vector VMUL. VV V 1, V 2, V 3 V 1=V 2 x. V 3 vector x vector VMUL. SV V 1, R 0, V 2 V 1=R 0 x. V 2 scalar x vector VLD V 1, R 1 V 1=M[R 1. . R 1+63] load, stride=1 VLDS V 1, R 2 V 1=M[R 1. . R 1+63*R 2] load, stride=R 2 VLDX V 1, R 1, V 2 V 1=M[R 1+V 2 i, i=0. . 63] indexed("gather") VST V 1, R 1 M[R 1. . R 1+63]=V 1 store, stride=1 VSTS V 1, R 2 V 1=M[R 1. . R 1+63*R 2] store, stride=R 2 VSTX V 1, R 1, V 2 V 1=M[R 1+V 2 i, i=0. . 63] indexed(“scatter") Untuk pengalamatan operand 13

Contoh Program untuk komputasi data vector Y[0: 63] = Y[0: 653] + a*X[0: 63]

Contoh Program untuk komputasi data vector Y[0: 63] = Y[0: 653] + a*X[0: 63] 64 element SAXPY: scalar LD R 0, a ADDI R 4, Rx, #512 loop: LD R 2, 0(Rx) MULTD R 2, R 0, R 2 LD R 4, 0(Ry) ADDD R 4, R 2, R 4 SD R 4, 0(Ry) ADDI Rx, #8 ADDI Ry, #8 SUB R 20, R 4, Rx BNZ R 20, loop Pada komputer non-vektor LD VLD 64 element SAXPY: vector R 0, a #load scalar a V 1, Rx #load vector X VMUL. SV V 2, R 0, V 1 #vector mult VLD V 3, Ry #load vector Y VADD. VV V 4, V 2, V 3 #vector add VST Ry, V 4 #store vector Y Pada komputer vektor 14

Parallel vs Terdistribusi • Parallel: – Beberapa prosesor secara serentak bekerjasama menyelesaikan satu masalah,

Parallel vs Terdistribusi • Parallel: – Beberapa prosesor secara serentak bekerjasama menyelesaikan satu masalah, berbagi memory dan clock • Terdistribusi: – Prosesor prosesor tidak berbagi memory atau clock sistem. Multiprosesor tipe Message passing ada yang termasuk dalam kategori komputer terdistribusi 15

SIMD MIMD 16

SIMD MIMD 16

Peningkatan kapasitas pengolahan I/O devices Mem Mem Inter connect Processor Mem I/O ctrl Inter

Peningkatan kapasitas pengolahan I/O devices Mem Mem Inter connect Processor Mem I/O ctrl Inter connect Pr ocessor • Kapasitas Memory ditingkatkan dengan menambah modul pengingat. • Kapasitas I/O dengan penambahan controllers dan piranti I/O. • Penambahan prosessor untuk pengolahan! 17

Evolusi Komputer sekuensial Dengan peng. Ingat cache Komputer dengan pengingat modular (memory interleaving) prosesor

Evolusi Komputer sekuensial Dengan peng. Ingat cache Komputer dengan pengingat modular (memory interleaving) prosesor Sj Dengan struktur pipeline dalam prosesornya S 2 S 1 18

Memory Reg. File FU ICache FU FU DCT HUF ICache FU Memory Reg. File

Memory Reg. File FU ICache FU FU DCT HUF ICache FU Memory Reg. File FU FU DCT HUF FU ICache Multiple functional Unit (FU) Multiprosesor 19

Superscalar: Power. PC 604 and Pentium Pro Multiple FU 20

Superscalar: Power. PC 604 and Pentium Pro Multiple FU 20

TMS 320 C 6701 DSP Block Diagram Program Cache/Program Memory 32 -bit address, 256

TMS 320 C 6701 DSP Block Diagram Program Cache/Program Memory 32 -bit address, 256 -Bit data 512 K Bits RAM Power Down ’C 67 x Floating-Point CPU Core Program Fetch Instruction Dispatch Instruction Decode Host Port Interface 4 Channel DMA External Memory Interface Control Registers Data Path 1 Data Path 2 A Register File B Register File L 1 S 1 M 1 D 2 M 2 S 2 L 2 Control Logic Test Emulation Interrupts 2 Timers Data Memory 32 -Bit address 8 -, 16 -, 32 -Bit data 512 K Bits RAM 2 Multichannel buffered serial ports (T 1/E 1)21

TMS 320 C 67 x CPU Core ’C 67 x Floating-Point CPU Core Program

TMS 320 C 67 x CPU Core ’C 67 x Floating-Point CPU Core Program Fetch Control Registers Instruction Dispatch Instruction Decode Data Path 1 Data Path 2 A Register File B Register File Control Logic Test Emulation L 1 Arithmetic Logic Unit S 1 M 1 Auxiliary Logic Unit D 1 Multiplier Unit D 2 M 2 S 2 L 2 Interrupts Floating-Point Capabilities Multiple FU 22

Intel IXP 1200 Network Processor SDRAM Ctrl Micro. Eng PCI Interface ICache SA Core

Intel IXP 1200 Network Processor SDRAM Ctrl Micro. Eng PCI Interface ICache SA Core Micro. Eng Hash Engine Micro. Eng IX Bus Interface Micro. Eng DCache Mini DCache Micro. Eng Scratch Pad SRAM Micro. Eng SRAM Ctrl 23

IXP 1200 Micro. Engine dari SRAM 32 SRAM Read XFER Registers Ke SRAM 32

IXP 1200 Micro. Engine dari SRAM 32 SRAM Read XFER Registers Ke SRAM 32 SRAM Write XFER Registers 64 Reg. array (A-Bank) ALU 32 SDRAM Read XFER Registers 64 Reg array (B-Bank) Dari SDRAM • 32 SDRAM Read XFER Registers ke SDRAM 32 -bit RISC instruction set 24

Intel Pentium Pro Quad CPU P-Pr o module 256 -KB Interrupt L 2 $

Intel Pentium Pro Quad CPU P-Pr o module 256 -KB Interrupt L 2 $ controller Bus interface P-Pr o module PCI bridge PCI bus PCI I/O cards PCI bridge PCI bus P-Pr o bus (64 -bit data, 36 -bit addr P-Pr o module ess, 66 MHz) Memory controller MIU 1 -, 2 -, or 4 -way interleaved DRAM 25

26

26

Intel Paragon i 860 L 1 $ Intel Paragon node Memory bus (64 -bit,

Intel Paragon i 860 L 1 $ Intel Paragon node Memory bus (64 -bit, 50 MHz) Mem ctrl DMA Driver Sandia’ s Intel Paragon XP/S-based Super computer 2 D grid network with processing node attached to every switch NI 4 -way interleaved DRAM 8 bits, 175 MHz, bidirectional 27

IBM SP-2 Power 2 CPU cache IBM SP-2 node Memory bus General inter connection

IBM SP-2 Power 2 CPU cache IBM SP-2 node Memory bus General inter connection network formed from Memory controller interleaved 4 -way DRAM Micro. Channel bus NIC DMA I/O i 860 NI DRAM 8 -port switches 28

Stanford: Hydra Design Single-chip multiprocessor Four processors Separate primary caches Write-through data caches to

Stanford: Hydra Design Single-chip multiprocessor Four processors Separate primary caches Write-through data caches to maintain coherence Shared 2 nd-level cache Separate read and write busses 29

SUN Enterprise P $ $2 $2 CPU/mem cards Mem ctrl Bus interface/switch Gigaplane bus

SUN Enterprise P $ $2 $2 CPU/mem cards Mem ctrl Bus interface/switch Gigaplane bus (256 data, 41 addr ess, 83 MHz) I/O cards 2 Fiber. Channel SBUS 100 b. T, SCSI Bus interface 30

MULTIPROSESOR Shared memory : beberapa prosesor berbagi ruang pengingat 31

MULTIPROSESOR Shared memory : beberapa prosesor berbagi ruang pengingat 31

32

32

The Problem of Cache Coherency Shared Memory Architectures CPU CPU Cache A’ 100 A’

The Problem of Cache Coherency Shared Memory Architectures CPU CPU Cache A’ 100 A’ 550 A’ 100 B’ 200 Memory A 100 B 200 B 440 I/O a) Cache and memory coherent: A’ = A, B’ = B. I/O Output of A gives 100 Input 440 to B b) Cache and memory incoherent: A’ ^= A. c) Cache and memory incoherent: B’ ^= B. 33

Multiprosesor Message Passing 34

Multiprosesor Message Passing 34

35

35

36

36

37

37

38

38

Virtual address spaces for a collection of processes communicating via shared addresses Machine physical

Virtual address spaces for a collection of processes communicating via shared addresses Machine physical address space Pn pr i vat e Load P 0 P 1 Pn P 2 Common physical addresses St or e Shared portion of address space P 2 pr i vat e Private portion of address space P 1 pr i vat e P 0 pr i vat e 39

Langkah langkah pembuatan Program Paralel 40

Langkah langkah pembuatan Program Paralel 40

41

41

FU Memory Reg. File FU DCT HUF FU ICache 42

FU Memory Reg. File FU DCT HUF FU ICache 42