CSCI 47175717 Computer Architecture Topic VectorArray Processors Reading

















- Slides: 17
CSCI 4717/5717 Computer Architecture Topic: Vector/Array Processors Reading: Stallings, Section 18. 7 CSCI 4717 – Computer Architecture Vector/Array Processors
Vector/Array Computing • Optimized for calculation rather than multitasking and I/O • Design focus is to perform parallel mathematical operations on a vector or array of data elements • Scalar processor would need to handle one element at a time. • Limited market -- Research, government agencies, meteorology CSCI 4717 – Computer Architecture Vector/Array Processors
Vector/Array Computing (continued) • Target applications: – data-intensive/scientific research such as: • Aerodynamics, seismology, meteorology • Continuous field simulation – specialized (high-performance) graphics applications • Applicable because of ever-increasing need for improved resolution and model capabilities CSCI 4717 – Computer Architecture Vector/Array Processors
Array Processor • Alternative to supercomputer • Configured as a peripheral to mainframe or minicomputer • Processor is only responsible for running vector portion of problem • The Sony Play. Station 3 uses a processor consisting of one scalar processor and eight vector processors. Developed by IBM, Toshiba and Sony. (Source: http: //en. wikipedia. org/wiki/Vector_computer) CSCI 4717 – Computer Architecture Vector/Array Processors
Vector/Array Operation • Power of vector computing comes in the form of special processing instructions (Single Instruction, Multiple Data or SIMD) • Lock-step execution of code issuing single instruction to a large number of identical processors (or ALUs) with a large register set working on different data elements • Single master CPU keeps control of the entire process CSCI 4717 – Computer Architecture Vector/Array Processors
Speed-Up Not Linear • As with any parallel processing architecture, the realized speed up of a vector processor is not linear because of: – Overhead for managing parallel computations – Bottlenecks for communication and storage – Load of application doesn't always match available processors • These problems have an increasing effect with increases in the number of processors CSCI 4717 – Computer Architecture Vector/Array Processors
Data Pipelining • The sequential nature of instructions allows for an instruction pipeline • Vector computing tends to have data that is well organized too • This allows for pipelining the data too • Single decode for instruction • Stages to fetch data, process data, store result in register CSCI 4717 – Computer Architecture Vector/Array Processors
Data Pipelining (continued) • Example: To add an array of numbers, processor must have the following information: – a single "add" instruction – start address for the data – end address for the data CSCI 4717 – Computer Architecture Vector/Array Processors
Vector/Array Programming • The programming goal is to divide a large dataset into independent sets that can be operated on in parallel • Requires a deep understanding of the algorithm being applied to the data • Distribute data to different processors • Initiate parallel processing • Bring everything back together when parallel processing is complete CSCI 4717 – Computer Architecture Vector/Array Processors
Vector/Array Programming (continued) • Example: Count the number of times a specific value appears in a large array • Begin by breaking up array into smaller arrays, one for each array processor • Each array processor, in parallel, counts the number of occurrences of the value • Final sum is then computed by adding the results from all of the processors CSCI 4717 – Computer Architecture Vector/Array Processors
Vector/Array Applications Which of the following applications would be better served by a vector or array computer than an SMP, cluster, or scalar processor? What component of the problem is parallel? – Web search indexing – Generating Fibonacci Sequence: f(i) = f(i-1) + f(i-2) – Weather prediction – Image processing for a game – Web site server – Photoshop-type image processing CSCI 4717 – Computer Architecture Vector/Array Processors
Scalar Programming • The following two slides are based on the multiplication of two 100 X 100 matrices A and B DO 100 I = 1, N DO 100 J = 1, N C(I, J) = 0. 0 DO 100 K = 1, N C(I, J) = C(I, J) + A(I, K)*B(K, J) (J = 1, N) 100 CONTINUE CSCI 4717 – Computer Architecture Vector/Array Processors
(J = 1, N) Vector Programming • The notation (J = 1, N) indicates that operations on all indices J are to be carried out on N processors as a single operation DO 100 I=1, N C(I, J) = 0. 0 (J = 1, N) DO 100 K = 1, N C(I, J) = C(I, J) + A(I, K)*B(K, J) (J = 1, N) 100 CONTINUE CSCI 4717 – Computer Architecture Vector/Array Processors
Fork/Join Parallel Programming • One method of parallel programming is the fork-join. • Programs start as a single process known as a master thread • The operation "fork" is used to indicate the beginning of sections of the program that are to be executed in parallel • The operation "join" is used to terminate the parallel threads created by "fork" to bring the program back to a single, master thread CSCI 4717 – Computer Architecture Vector/Array Processors
Fork/Join Method (continued) DO 50 J=1, N – 1 FORK 100 50 CONTINUE J=N 100 DO 200 I=1, N C(I, J) = 0. 0 DO 200 K = 1, N C(I, J) = C(I, J) + A(I, K)*B(K, J) 200 CONTINUE CSCI 4717 – Computer Architecture Vector/Array Processors
Neural Networks CSCI 4717 – Computer Architecture Vector/Array Processors
What? ! A Blank Slide? ! It must be over!!! CSCI 4717 – Computer Architecture Vector/Array Processors