Advanced Computer Architecture Dataflow Processing A R Hurson



















































































- Slides: 83

Advanced Computer Architecture Dataflow Processing A. R. Hurson 128 EECH Building, Missouri S&T hurson@mst. edu 1

Advanced Computer Architecture • Control Flow Computation • Operands are accessed by their addresses. • Shared memory cells are the means by which data is passed between instructions. • Flow of control is implicitly sequential, but special control instructions can be introduced to explicitly identify concurrency. 2

Advanced Computer Architecture • Control Flow Computation • Program Counter(s) is (are) used to sequence the execution of instructions in a centralized environment. 3

Advanced Computer Architecture • A dataflow program is a program with a partial ordering defined by the data interdependencies. • In a dataflow program the activation (execution) of an instruction is triggered (fired) by the availability of its input data. 4

Advanced Computer Architecture a + b + Þ + a+b 5

Advanced Computer Architecture • Data Dependencies input (a, b, c) a : = 2*a b : = -b/a c : = b 2 -2*a*c c : = sqrt(c) c : = c/a b : = b-c a : = b+c output(a, b) 6

Advanced Computer Architecture • Dataflow Principles • The dataflow model of computation deviates from the conventional control-flow method in two basic principles: asynchrony and functionality: 7

Advanced Computer Architecture • Dataflow Principles • Asynchrony: an instruction is fired (executed) only when all the required operands are available. • Functionality: any two enabled instructions can be executed in either order or concurrently — i. e. , no side-effects. 8

Advanced Computer Architecture • Dataflow Principles • Within the scope of dataflow processing, implicit parallelism is achieved by allowing side-effect free expressions and functions to be evaluated in parallel. 9

Advanced Computer Architecture • Dataflow Principles • In a dataflow environment, conventional concepts such as variables and memory updating are non-existent. • Objects (operand values) are consumed by an actor (instruction) yielding a result object which is passed to the next actor(s). 10

Advanced Computer Architecture • Dataflow Principles • Within the scope of a concurrent environment, dataflow computation addresses the programmability, memory latency, and synchronization issues. 11

Advanced Computer Architecture • Questions • Define; programmability, memory latency, and synchronization. • How have these issues been addressed in the conventional multiprocessor systems? • Why does the dataflow model of computation offer good solutions for these problems? 12

Advanced Computer Architecture • Classification • The dataflow model of computation has been traditionally classified as either static or dynamic: 13

Advanced Computer Architecture • In the static organization, a dataflow actor can be executed only when all of the tokens are available on its input arcs and no token exists on any of its output arcs. • In the dynamic organization, a dataflow actor can be enabled only when all of the tokens of the same tag (color) are available on its input arc 14

Advanced Computer Architecture • Dataflow Graph • A dataflow program can be represented as a directed graph, G = G(N, A), where nodes (actors) in N represent instructions, and arcs in A represent data dependencies among the nodes. • The operands are conveyed from one node to another in data packets called tokens via the arcs. 15

Advanced Computer Architecture • Dataflow Graph a b (a+b) - (a*b) + * - 16

Advanced Computer Architecture 1 ready to fire 4 2 + ready to fire * 17 Fall 2012

Advanced Computer Architecture 18

Advanced Computer Architecture 3 + * 6 ready to fire 8 - 19

Advanced Computer Architecture 4 + * -2 Fall 2012 20

Advanced Computer Architecture 2 a b c * * 2 2 * - neg sqrt / / + - 21

Advanced Computer Architecture • Dataflow Computation • Data are stored in the instructions — i. e. , no shared memory. • Data are passed among instructions as tokens. • An instruction independent of other instructions can begin its execution as soon as it is ready to be fired — e. g. , firing rules for static and dynamic environments. 22

Advanced Computer Architecture • An Example (b) a + ( ) (c) b g /1 1 g * ( ) - ( ) ( ) g /2 d /1 a a = (b+1) * (b+c) 23

Advanced Computer Architecture • The Basic Primitives • In a dataflow graph two types of links are distinguished, the data link, and the Boolean link. • A data link is used to pass data tokens — i. e. , real numbers, integers, . . . — among the arcs. Data Link 24

Advanced Computer Architecture • The Basic Primitives • A Boolean link is used to pass control tokens among the arcs. Boolean Link 25

Advanced Computer Architecture • The Basic Primitives • Operators: a data value is produced by an operator as a result of some function f. J 1 Jn f Þ f g = f( J 1 , . . . J n ) 26

Advanced Computer Architecture • The Basic Primitives • Decider: a true or false control value is generated by a decider depending on its input tokens. The control token produced at a decider can be combined with other control tokens by means of a Boolean operator. J 1 Jn P P Þ b = P( J 1 , . . . , J n ) 27

Advanced Computer Architecture • An Example • NOR Operator T T NOR Þ NOR F T F NOR Þ NOR F 28

Advanced Computer Architecture • An Example • NOR Operator F T NOR Þ NOR F F F NOR Þ NOR T 29

Advanced Computer Architecture • The Basic Primitives • Control tokens direct the flow of data tokens by means of T-gates, F-gates, and merge actors. T-gate F-gate T F merge 30

Advanced Computer Architecture • The Basic Primitives • A T-gate passes the data token on its input arc to its output arc when it receives a control token conveying the value true. Ú T T-gate Ú Þ T-gate F T-gate Þ T-gate Ú 31

Advanced Computer Architecture • The Basic Primitives • An F-gate will pass its input data token to its output arc only on the False value token on its control input. Ú Ú F F-gate Þ F-gate T F-gate Þ F-gate Ú 32

Advanced Computer Architecture • The Basic Primitives • A merge actor has a true input, a false input and a control input. It passes to its output arc a data token from the input arc corresponding to the value of the control token received. Any token on the other input is not affected. 33

Advanced Computer Architecture J 2 J 1 T F F J 1 Þ T F J 2 J T 1 T F T Þ J F 1 34

Advanced Computer Architecture • The Basic Primitives • A switch actor is a combination of T-gate and F-gate. It directs an input data token to one of its output arcs depending on the control input. J 1 T F F Þ T F J 1 35

Advanced Computer Architecture • The Basic Primitives • A copy actor is an identity operator which duplicates the input token. J 1 Þ b b b 36

Advanced Computer Architecture • An Example — Using the basic primitives, draw the dataflow graph of the following program: a b + Input (a, b); y : = (a+b)/x; x : = (a*(a+b))+b; Output (x, y); * + / x y 37

Advanced Computer Architecture • Conditional Construct • One can build more complex constructions using the basic primitive structures. 38

Advanced Computer Architecture Generated by a Predicate Actor Input Data F-gate Then Part Else Part T F 39

Advanced Computer Architecture • While Loop Input Data Initially False F T Condition Loop Body F-gate T-gate 40

Advanced Computer Architecture • An Example — Show the dataflow graph of: input (w, x); y : = x; t : = 0; while t w do begin if y > 1 then y : = y ÷ 2 else y : = y * 3; t : = t+1; end output (y); 41

Advanced Computer Architecture • Dataflow Architecture • • Dataflow computers have a data-driven organization. • since there is no use for shared memory cells, dataflow programs are free from side effects. Finally, dataflow computations have no far-reaching effects (locality of effect). The data-driven concept means asynchrony. As a result, a high degree of implicit parallelism is expected in a dataflow computer. 42

Advanced Computer Architecture • Dataflow Architecture • Depending on the way data tokens are handled, dataflow computers are divided into the static model and the dynamic model. • In a static dataflow machine only one token is allowed to exist on any arc at any given time. • In a dynamic dataflow machine more than one token can exist in an arc. 43

Advanced Computer Architecture • A Static Dataflow Machine • System is Composed of Five Modules: • Memory Section consists of instruction cells holding a dataflow instruction. • Processing Section consists of processing units that perform the basic dataflow operations on data values. 44

Advanced Computer Architecture • A Static Dataflow Machine • Arbitration Network transfers operation packets from the memory section to the processing section. • Distribution Network transfers the generated data packets from the processing section to the memory section. • Control Network transfers control packets from the processing section to the memory section. 45

Advanced Computer Architecture Processing Section Processing Unit • • • Processing Unit Control Tokens • • • Control Network Data Tokens Operation Packets Instruction Cell Block • • • Distribution Network • • • Arbitration Network • • • Instruction Cell Block Memory Section 46

Advanced Computer Architecture • A Static Dataflow Machine • Memory Section — The memory section holds a representation of the program to be executed and the data values. Memory section is organized into instruction cells. • Each instruction cell corresponds to an actor of the dataflow program. 47

Advanced Computer Architecture • A Static Dataflow Machine • Instruction Cell • Each instruction cell is composed of three words. • The first word holds the operation code and the addresses of the instruction cells to which the result of the operation is to be directed. • The next two words hold the operands. Each operand word may be set to behave as a constant or a variable. • There are six different instruction formats. 48

Advanced Computer Architecture • Instruction Format • Operators can be of two forms: Unary operator or Binary operator: 49

Advanced Computer Architecture • Instruction format • Deciders can be of Unary or Binary types: 50

Advanced Computer Architecture • Instruction Format • Boolean operators can be Unary or Binary operators: 51

Advanced Computer Architecture • Instruction Format • Each operand value - i. e. , gi - has the following format: Gate Flag Off True False Value Flag No gate value control packet is received True gate value control packet is received False gate value control packet is received Data Value Off On No data value is received Data Value is received 52

Advanced Computer Architecture • Instruction Format • n: # of acknowledge signals expected • m: # of acknowledge signals received • gi: gate code • ti: result-tag defines whether control packet is of gate type or data type. 53

Advanced Computer Architecture operand word became active • Instruction Format 54

Advanced Computer Architecture • A Static Dataflow Machine — An Example • The following expression is assumed: Y(t) = A * X(t) + B * Y(t-1) + C * Y(t-2) • Show its dataflow graph and its "simple“ representation in the memory section. 55

Advanced Computer Architecture • A Static Dataflow Machine — An Example 7 I 8 out y(-1) 3 y(-2) * B + 6 * C + 5 4 1 in x(0) 2 * A 56

Advanced Computer Architecture • An Example — Initialization of Memory Cells 57

Advanced Computer Architecture • A Static Dataflow Machine • Processing Section — It is a collection of five pipeline processing units: • Multiplier Unit for complex operands, • Adder and Subtractor Unit for complex operands, • Distributor Unit to replicate and distribute complex values, 58

Advanced Computer Architecture • A Static Dataflow Machine • Integer Processor Unit for integer and test operations, and • Control Processor Unit to replicate and distribute the integer and Boolean values. 59

Advanced Computer Architecture • A Static Dataflow Machine • Processing Section — Each functional unit is organized as three independent pipelines. One performs the operation and the other two carry destination addresses. 60

Advanced Computer Architecture • A Static Dataflow Machine op. code Instruction Packet d 1 x y d 2 Identity Pipeline 1 2 d 1 Z Result Packet d 2 Z Result Packet Computation Pipeline 61 1 2

Advanced Computer Architecture • A Static Dataflow Machine • Arbitration Network • Arbitration network is designed to establish a smooth flow of the instruction packets from the memory section to the processing section. • The network is composed of five basic building blocks: 62

Advanced Computer Architecture • A Static Dataflow Machine • Arbitration Network • arb sw arbitration unit switch unit • • buffer unit • s p serial to parallel transfer p s parallel to serial transfer 63

Advanced Computer Architecture • A Static Dataflow Machine • Distribution Network • It is designed to transfer the result packets from the processing section to the memory section. • It utilizes the same basic building blocks as the arbitration network does. 64

Advanced Computer Architecture • Static Dataflow Machine • Control Network • It is used to transfer Boolean values and acknowledge signals from the processing section to the memory section. Because of the very nature of the data value transferred via control networks, it is composed of the switch and arbitration units only. • The control network transfers two types of tokens, namely: gate type and data type. 65

Advanced Computer Architecture • Static Dataflow Machine • Control Network • Gate Type: gate, True , address False , address • Data Type: Value, True 66

Advanced Computer Architecture • A Dynamic Dataflow Machine • It is a backend system composed of five units connected as a pipeline ring around which the tokens flow. • The processing unit allows concurrent execution of data graph nodes, • The token queue temporarily stores tokens lying on the data graph arcs, 67

Advanced Computer Architecture • A Dynamic Dataflow Machine • The matching unit gathers pairs of tokens with the same destination node address and label, • The node store represents information regarding the dataflow graph, and • The switch unit establishes communication between the frontend and backend processor, and reroutes the resultant tokens back to the pipeline ring. 68

Advanced Computer Architecture • A Dynamic Dataflow Machine • Each unit in the pipeline is internally synchronous, but communications with other units is based on a standard asynchronous protocol. 69

Advanced Computer Architecture • A Dynamic Dataflow Machine To Host From Host Switch Unit tokens Token Queue processing unit tokens inst. packets node store token pairs Matching Unit 70

Advanced Computer Architecture • A Dynamic Dataflow Machine • Token Queue • It is a static RAM FIFO buffer of size 16 k * 96 bits which allows a read and a write to be performed in a single 200 hsec pipeline period. multiple bank structures could be considered, however, it was not considered in the prototype due to its complexity. 71

Advanced Computer Architecture • A Dynamic Dataflow Machine • Each token, is comprised of 96 bits, has the following format: (10 bits) Label (36 bits) Destination Address (18 bits) Token Value (32 bits) Type Information and Control 72

Advanced Computer Architecture • A Dynamic Dataflow Machine • Matching Unit • It is associative in nature and a critical part of the machine operation. • It should provide storage for a large number of tokens awaiting their matching pairs. • It is composed of 8 banks of 2 k * 96 bits. 73

Advanced Computer Architecture • A Dynamic Dataflow Machine • Matching Unit • An 11 -bit hash addressed from the 54 -bit label and destination address parts of each incoming token is generated. This address references 8 memory words in the eight parallel banks. 74

Advanced Computer Architecture • A Dynamic Dataflow Machine • Matching Unit • Tokens (if any) at the addressed locations are compared with the incoming token. If a match is found, a token-pair is generated and passed to the node store, otherwise, it is written to any parallel bank awaiting its pair. • An unsuccessful match takes 320 hsec and a successful match requires 240 hsec. 75

Advanced Computer Architecture • A Dynamic Dataflow Machine • Node Store • It is a 16 K * 72 -bit memory with a 200 hsec access time augmented by a segment table. It is used to store dataflow graphs. Each instruction has the following format: (10 bits) op. code (12 bits) Destination address 1 (18 bits) Destination address 2 or literal (32 bits) Type information and control 76

Dataflow Processing • A Dynamic Dataflow Machine • Node Store • Upon receiving a token pair, an instruction packet is generated and passed to the processing unit. (10 bits) Label (36 bits) op. code (12 bits) operand 1 (32 bits) operand 2 (32 bits) Type information Destination address 1 and control (18 bits) (32 bits) Destination address 2 or literal 77

Advanced Computer Architecture • A Dynamic Dataflow Machine • Processing Unit • It is a writeable micro-program processor consisting of two pipeline stages. • The first stage handles simple label operations and gathers some performance statistics. • The second stage is a parallel array of 15 processing elements. Each element is capable of performing 24 -bit integer or 32 -bit floating-point arithmetic operation. 78

Advanced Computer Architecture • A Dynamic Dataflow Machine • Processing Unit • The microinstruction cycle time of each processor is 200 hsec. • an instruction requires about five to 50 microinstructions (giving an average running time of 4. 5 µsec). 79

Advanced Computer Architecture • A Dynamic Dataflow Machine • An Example • The following expression is assumed, show its representation in the node store and initial tokens. A = (W * X) + (Y * Z) 80

Advanced Computer Architecture • A Dynamic Dataflow Machine • An Example W X Y * Z * + A 81

Advanced Computer Architecture • A Dynamic Dataflow Machine • An Example — Node Store 82

Dataflow Processing • A Dynamic Dataflow Machine • An Example — Initial Tokens Control Label Inst. Address Value no no no ? ? ? 1 L. H. 1 R. H. 2 L. H. W X Y no ? 2 R. H. Z 83