COMP 4211 Advance Computer Architecture Vector Processor 25102021
- Slides: 25
COMP 4211 : Advance Computer Architecture Vector Processor 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 1
Overview n n n n 25/10/2021 Introduction: What and Why? Basic Vector Architecture Example: MIPS Vs VMIPS Parallelism using convoys Vector Memory Systems Real World Issues: u Vector Length u Stride Introduction into Cray-1 COMP 4211 - Advanced Computer Architecture Yian Sun 2
Introduction What is a Vector Processor? n Consider an operation D = A +C n Vector processor provides high-level operations that work on vectors. n A typical instruction might add two 64 element FP vectors. n Commercialized long before ILP machines. 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 3
Introduction cont. Why Vector Processors? n It is equivalent to executing an entire loop u Reducing instruction fetch and decode bandwidth. n Each instruction guarantees each result is independent on other results in same vector u No data hazard check needed in an instruction. u Executed using array of paralleled functional units, or deep pipeline. 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 4
Introduction cont. n n n 25/10/2021 Hardware need only check for data hazards between two instructions, once per operand. u More instructions per data check. Memory access for entire vector, not a single word. u Reduced Latency Multiple vector instructions in progress. u Further parallelism COMP 4211 - Advanced Computer Architecture Yian Sun 5
Basic Vector Architecture n n n 25/10/2021 Ordinary scalar pipeline unit + Vector unit. Two Types – u Vector-register -> all operations except load and store based on registers. u Memory-memory -> all operations are memory to memory. Concentrate on Vector-register, particularly VMIPS architecture. COMP 4211 - Advanced Computer Architecture Yian Sun 6
BVA – the components Vector register u Fixed length, holds a single vector u In VMIPS « 2 read and 1 write port. « 8 vector registers, 64 elements each Vector functional units u Fully pipelined, start new operations every cycle. u Might contain scalar function unit. Control unit u Detect structural and data hazards. 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 7
BVA – the components cont. n n n 25/10/2021 Vector load-store unit u Loads and stores vector to and from memory. Special-purpose registers u Vector length u Vector mask registers Set of Scalar registers u Provide data as input to the vector functional units. u Compute addresses to pass to the Load-Store unit. u In VMIPS « 32 general purpose and 32 floating-point registers. COMP 4211 - Advanced Computer Architecture Yian Sun 8
Example: MIPS Vs VMIPS n 25/10/2021 Greatly reduced instruction bandwidth u Six instructions instead of 600. COMP 4211 - Advanced Computer Architecture Yian Sun 9
Parallelism using convoys Convoys u A set of instructions that could begin execution together. u Consider this sequence of code. • Using Convoys, results in 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 10
Vector Memory Systems n n 25/10/2021 Problem u Memory system needs to be able to produce and accept large amounts of data. u But how do we achieve this when there is poor access time? Solution u Creating multiple memory banks. « Useful for fragmented accesses. « Support multiple loads per clock cycle. « Allows for multi-processor sharing. COMP 4211 - Advanced Computer Architecture Yian Sun 11
Vector Memory System Example 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 12
Real World Issues (1) Vector – Length Control n Problem u How do we support operations where the length is unknown or not the vector length? n Solution u Provide a vector-length register, solves problem only if real length is less than Maximum Vector Length. u Use Technique Called strip mining. 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 13
Strip mining n n 25/10/2021 Generating code where vector operations are done for a size no greater than MVL. Create 2 loops u One that handles any number of iterations multiple of MVL. u Another that handles the remaining iterations. Code becomes vectorizable. Careful handling of VLR needed. COMP 4211 - Advanced Computer Architecture Yian Sun 14
Example: Strip Mining n For the DAXPY loop, a we can generate a C code as below. low=1; /*Assume start element at 1*/ v. L = n % mv. L; /*find the odd – size piece */ for(j=0; j<=n/mv. L; j++){ /*Outer Loop*/ for(i=low; i<=low+v. L-1; i++){ /*Inner loop-runs for length v. L*/ y[i] = a*x[i] + y[i]; /*Start of next vector*/ } low = low + v. L; /*Find start of next vector*/ v. L = mv. L; /* reset length to max */ } 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 15
Real World Issues (2) Vector Stride n n 25/10/2021 Problem u Position in memory of adjacent elements in may not be sequential. Set up time could be enormous. u E. g. Matrix Multiplication. Solution u Distance seperating elements is called the Stride. u Store the stride in a register, so only a single load or store is required. COMP 4211 - Advanced Computer Architecture Yian Sun 16
Vector Stride Access time u Vector processors use interleave memory banks. Nonunit Strides can cause stalls. u Stall will occur if No. of banks /LCM (Stride, No. of Banks) < Bank Busy time u No conflicts if Stride and no. of banks are relatively prime. u Increasing the no. of banks to greater than minimum. u Most vector supercomputers have at least 64, with some having up to 1024. 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 17
Example-Vector Stride 25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 18
Cray - 1 n n 25/10/2021 Most well-known vector processor, released in 1976. Fastest super-computer in the late 70 s. 32 bit instruction length. Architecture Consists of 3 sections: u The Main Memory u The Scalar Subsystem u The Vector Subsystem COMP 4211 - Advanced Computer Architecture Yian Sun 19
25/10/2021 COMP 4211 - Advanced Computer Architecture Yian Sun 20
Cray-1: Main Memory n n 25/10/2021 16 banks, each consisting of 72 64 K, 64 -bit words. Cycle time of 50 n. Sec, which is equivalent to 4 cycles. Can transfer 1 -4 words per clock period depending on the register or buffer. 4 words per clock cycle for instruction buffer, resulting in a bandwidth of 1280 m. B/sec. COMP 4211 - Advanced Computer Architecture Yian Sun 21
Cray-1: Scalar subsystem n 25/10/2021 Consists of u Instruction buffers u 2 file scalar registers u 2 address functional registers u Scalar functional unit u Shared floating point functional unit COMP 4211 - Advanced Computer Architecture Yian Sun 22
Cray-1: Vector subsystem n 25/10/2021 Consist of u 8 vector registers u Set of 3 vector functional units u Shared set of 3 floating point functional units COMP 4211 - Advanced Computer Architecture Yian Sun 23
Cray-1: Instruction Format n n 25/10/2021 Binary arithmetic and logic instructions (a) Unary shift and mask instructions (b) Memory read and store instructions (c) Branch instructions use lower 24 bit for branch address. COMP 4211 - Advanced Computer Architecture Yian Sun 24
References n n n 25/10/2021 Computer Architecture: A quantitative Approach, Patterson and Hennessy, Appendix G, section 1 -3. Computer Architecture: A modern Synthesis, Subrata Dasgupta, Chapter 7, P 246 – P 249. http: //www. crhc. uiuc. edu/IMPACT/ece 412/public_ht ml/Notes/412_lec 20/ The Cray-1 Computer System, Richard M Russell, Cray Research Inc. http: //csep 1. phy. ornl. gov/ca/node 24. html COMP 4211 - Advanced Computer Architecture Yian Sun 25
- Comp 4211
- Vector 4211
- Comp 4211
- Cray vector processor
- Vector processor
- Csci 4211
- Jus 4122
- Fakultetsoppgave jus uio
- Jus 4211
- Ece 4211
- Csci 4211
- Buses in computer architecture
- Introduction to pentium processor
- Dsp processor architecture
- Basic processor architecture
- Dsp algorithms and architecture
- Intel core processor architecture
- Digital signal processor architecture
- Cell broadband engine architecture
- Scalable processor architecture
- Difference between computer architecture and organization
- Basic computer organization and design
- Organisasi sistem komputer
- Terminator 2 neural net processor
- Vector addition properties
- Cosenos directores de un vector