GCD A simple example to introduce Bluespec Arvind
GCD: A simple example to introduce Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 15, 2008 http: //csg. csail. mit. edu/6. 375 L 05 -1
Bluespec: State and Rules organized into modules module interface All state (e. g. , Registers, FIFOs, RAMs, . . . ) is explicit. Behavior is expressed in terms of atomic actions on the state: Rule: guard action Rules can manipulate state in other modules only via their interfaces. February 15, 2008 http: //csg. csail. mit. edu/6. 375 2
Programming with rules: A simple example Euclid’s algorithm for computing the Greatest Common Divisor (GCD): 15 9 3 6 3 0 February 15, 2008 6 6 6 3 3 3 subtract swap subtract answer: subtract http: //csg. csail. mit. edu/6. 375 3
GCD in BSV module mk. GCD (I_GCD); Reg#(int) x <- mk. Reg. U; Reg#(int) y <- mk. Reg(0); rule swap ((x > y) && (y != 0)); x <= y; y <= x; endrule subtract ((x <= y) && (y != 0)); y <= y – x; endrule x y swap sub State typedef int Int#(32) Internal behavior method Action start(int a, int b) if (y==0); x <= a; y <= b; If (a==0) then 0 else b External endmethod interface method int result() if (y==0); return x; endmethod endmodule Assume a/=0 February 15, 2008 http: //csg. csail. mit. edu/6. 375 4
GCD Hardware Module y == 0 t rdy #(type t) In a GCD call t could be GCD module int result implicit conditions y == 0 enab rdy start t int Int#(32), UInt#(16), Int#(13), . . . interface I_GCD; t t method Action start (int a, int b); method intt result(); endinterface The module can easily be made polymorphic Many different implementations can provide the same interface: module mk. GCD (I_GCD) February 15, 2008 http: //csg. csail. mit. edu/6. 375 5
GCD: Another implementation module mk. GCD (I_GCD); Reg#(int) x <- mk. Reg. U; Reg#(int) y <- mk. Reg(0); Combine swap and subtract rule swap. ANDsub ((x > y) && (y != 0)); x <= y; y <= x - y; endrule subtract ((x<=y) && (y!=0)); y <= y – x; endrule method Action start(int a, int b) if (y==0); x <= a; y <= b; endmethod int result() if (y==0); return x; endmethod Does it compute faster ? endmodule February 15, 2008 http: //csg. csail. mit. edu/6. 375 6
Bluespec Tool flow Bluespec System. Verilog source Bluespec Compiler Blueview Verilog 95 RTL C Bluesim Cycle Accurate Verilog sim VCD output Legend files Bluespec tools 3 rd party tools February 15, 2008 RTL synthesis gates Debussy Visualization http: //csg. csail. mit. edu/6. 375 7
Generated Verilog RTL: GCD module mk. GCD(CLK, RST_N, start_a, start_b, EN_start, RDY_start, result, RDY_result); input CLK; input RST_N; // action method start input [31 : 0] start_a; input [31 : 0] start_b; input EN_start; output RDY_start; // value method result output [31 : 0] result; output RDY_result; // register x and y reg [31 : 0] x; wire [31 : 0] x$D_IN; wire x$EN; reg [31 : 0] y; wire [31 : 0] y$D_IN; wire y$EN; . . . // rule RL_subtract assign WILL_FIRE_RL_subtract = x_SLE_y___d 3 && !y_EQ_0___d 10 ; // rule RL_swap assign WILL_FIRE_RL_swap = !x_SLE_y___d 3 && !y_EQ_0___d 10 ; . . . February 15, 2008 http: //csg. csail. mit. edu/6. 375 8
x y • en rdy start Generated Hardware x_en x x rdy result > y_en !(=0) y next state values sub predicates swap? subtract? x_en = swap? y_en = swap? OR subtract? February 15, 2008 http: //csg. csail. mit. edu/6. 375 9
x y • en rdy start Generated Hardware Module start_en x_en x x rdy !(=0) y sub result > y_en swap? subtract? x_en = swap? OR start_en y_en = swap? OR subtract? OR start_en rdy = (y==0) February 15, 2008 http: //csg. csail. mit. edu/6. 375 10
GCD: A Simple Test Bench module mk. Test (); Reg#(int) state <- mk. Reg(0); I_GCD gcd <- mk. GCD(); rule go (state == 0); Why do we need the state variable? gcd. start (423, 142); state <= 1; endrule finish (state == 1); $display (“GCD of 423 & 142 =%d”, gcd. result()); state <= 2; endrule endmodule February 15, 2008 http: //csg. csail. mit. edu/6. 375 11
GCD: Test Bench module mk. Test (); Reg#(int) state <- mk. Reg(0); Reg#(Int#(4)) c 1 <- mk. Reg(1); Reg#(Int#(7)) c 2 <- mk. Reg(1); I_GCD gcd <- mk. GCD(); Feeds all pairs (c 1, c 2) 1 < c 1 < 7 1 < c 2 < 63 to GCD rule req (state==0); gcd. start(sign. Extend(c 1), sign. Extend(c 2)); state <= 1; endrule resp (state==1); $display (“GCD of %d & %d =%d”, c 1, c 2, gcd. result()); if (c 1==7) begin c 1 <= 1; c 2 <= c 2+1; end else c 1 <= c 1+1; if (c 1==7 && c 2==63) state <= 2 else state <= 0; endrule endmodule February 15, 2008 http: //csg. csail. mit. edu/6. 375 12
GCD: Synthesis results Original (16 bits) n n Clock Period: 1. 6 ns Area: 4240 mm 2 Unrolled (16 bits) n n Clock Period: 1. 65 ns Area: 5944 mm 2 Unrolled takes 31% fewer cycles on the testbench February 15, 2008 http: //csg. csail. mit. edu/6. 375 13
Rule scheduling and the synthesis of a scheduler February 15, 2008 http: //csg. csail. mit. edu/6. 375 L 05 -14
GAA Execution model Repeatedly: Select a rule to execute Compute the state updates Make the state updates Highly nondeterministic User annotations can help in rule selection Implementation concern: Schedule multiple rules concurrently without violating one-rule-at-a-time semantics February 15, 2008 http: //csg. csail. mit. edu/6. 375 15
Rule: As a State Transformer A rule may be decomposed into two parts p(s) and d(s) such that snext = if p(s) then d(s) else s p(s) is the condition (predicate) of the rule, a. k. a. the “CAN_FIRE” signal of the rule. (conjunction of explicit and implicit conditions) d(s) is the “state transformation” function, i. e. , computes the next-state value in terms of the current state values. February 15, 2008 http: //csg. csail. mit. edu/6. 375 16
Compiling a Rule r (f. first() > 0) ; x <= x + 1 ; f. deq (); endrule enable p f x current state February 15, 2008 f x rdy signals read methods d enable signals action parameters p = enabling condition d = action signals & values http: //csg. csail. mit. edu/6. 375 next state values 17
Combining State Updates: strawman p’s from the rules p 1 OR that update R pn latch enable d’s from the rules d 1, R OR that update R dn, R next state value R What if more than one rule is enabled? February 15, 2008 http: //csg. csail. mit. edu/6. 375 18
Combining State Updates p 1 p’s from all the rules pn 1 Scheduler: Priority Encoder OR n latch enable d’s from the rules d 1, R OR that update R dn, R next state value R Scheduler ensures that at most one fi is true February 15, 2008 http: //csg. csail. mit. edu/6. 375 19
One-rule-at-a-time Scheduler p 1 p 2 pn Scheduler: Priority Encoder 1. f i p i 2. p 1 p 2 . . pn f 1 f 2 . . fn 3. One rewrite at a time i. e. at most one fi is true February 15, 2008 http: //csg. csail. mit. edu/6. 375 f 1 f 2 fn ve g i t va eein r e s ant n o c ar u y r g Ve y of ess wa rectn cor 20
Executing Multiple Rules Per Cycle: Conflict-free rules rule ra (z > 10); x <= x + 1; endrule Parallel execution behaves like ra < rb = rb < ra rule rb (z > 20); y <= y + 2; endrule Rulea and Ruleb are conflict-free if s. pa(s) pb(s) 1. pa(db(s)) pb(da(s)) 2. da(db(s)) == db(da(s)) Parallel Execution can also be understood in terms of a composite rule February 15, 2008 rule ra_rb((z>10)&&(z>20)); x <= x+1; y <= y+2; endrule http: //csg. csail. mit. edu/6. 375 21
Executing Multiple Rules Per Cycle: Sequentially Composable rules rule ra (z > 10); x <= y + 1; endrule Parallel execution behaves like ra < rb rule rb (z > 20); y <= y + 2; endrule Rulea and Ruleb are sequentially composable if s. pa(s) pb(da(s)) Parallel Execution can also be understood in terms of a composite rule February 15, 2008 rule ra_rb((z>10)&&(z>20)); x <= y+1; y <= y+2; endrule http: //csg. csail. mit. edu/6. 375 22
Multiple-Rules-per-Cycle Scheduler p 1 p 2 Scheduler f 1 f 2 Scheduler pn Scheduler Divide the rules into smallest conflicting groups; provide a scheduler for each group fn 1. fi pi 2. p 1 p 2 . . pn f 1 f 2 . . fn 3. Multiple operations such that fi fj Ri and Rj are conflict-free or sequentially composable February 15, 2008 http: //csg. csail. mit. edu/6. 375 23
Muxing structure Muxing logic requires determining for each register (action method) the rules that update it and under what conditions Conflict Free/Mutually Exclusive) d 1 and or p 1 d 2 and p 2 Sequentially Composable d 1 and p 1 and ~p 2 d 2 and p 2 February 15, 2008 CF rules either do not update the same element or are ME p 1 ~p 2 or http: //csg. csail. mit. edu/6. 375 24
Scheduling and control logic Modules (Current state) Rules p 1 d 1 cond action February 15, 2008 pn dn “CAN_FIRE” “WILL_FIRE” p 1 pn 1 Scheduler Modules (Next state) n d 1 dn Muxing http: //csg. csail. mit. edu/6. 375 25
Extra’s February 15, 2008 http: //csg. csail. mit. edu/6. 375 L 05 -26
Sequentially Composable rules. . . rule ra (z > 10); x <= 1; endrule rb (z > 20); x <= 2; endrule Composite rules Parallel execution can behave either like ra < rb or rb < ra but the two behaviors are not the same Behavior ra < rb rule ra_rb(z>10 && z>20); x <= 2; endrule Behavior rb < ra rule rb_ra(z>10 && z>20); x <= 1; endrule February 15, 2008 http: //csg. csail. mit. edu/6. 375 27
Mutually Exclusive Rules Rulea and Ruleb are mutually exclusive if they can never be enabled simultaneously s. pa(s) ~ pb(s) Mutually-exclusive rules are Conflict-free even if they write the same state Mutual-exclusive analysis brings down the cost of conflict-free analysis February 15, 2008 http: //csg. csail. mit. edu/6. 375 28
Compiler determines if two rules can be executed in parallel Rulea and Ruleb are conflict-free if s. pa(s) pb(s) 1. pa(db(s)) pb(da(s)) 2. da(db(s)) == db(da(s)) D(Ra) R(Rb) = D(Rb) R(Ra) = R(Ra) R(Rb) = Rulea and Ruleb are sequentially composable if s. pa(s) pb(da(s)) D(pb) R(Ra) = These properties can be determined by examining the domains and ranges of the rules in a pairwise manner. These conditions are sufficient but not necessary. Parallel execution of CF and SC rules does not increase the critical path delay February 15, 2008 http: //csg. csail. mit. edu/6. 375 29
Homework problem Binary Multiplication February 15, 2008 http: //csg. csail. mit. edu/6. 375 L 05 -30
Exercise: Binary Multiplier Simple binary multiplication: 1001 0101 1001 0000 0101101 x // // d = 4’d 9 r = 4’d 5 d << 0 (since r[0] == 1) 0 << 1 (since r[1] == 0) d << 2 (since r[2] == 1) 0 << 3 (since r[3] == 0) product (sum of above) = 45 What does it look like in Bluespec? d r product One step of multiplication February 15, 2008 http: //csg. csail. mit. edu/6. 375 31
Multiplier in Bluespec module mk. Mult (I_mult); Reg#(Int#(32)) product <- mk. Reg(0); Reg#(Int#(32)) d <- mk. Reg(0); Reg#(Int#(16)) r <- mk. Reg(0); rule cycle (r != 0); if (r[0] == 1) product <= product + d; d <= d << 1; r <= r >> 1; endrule method Action start (Int#(16)x, Int#(16)y) if (r == 0); d <= sign. Extend(x); r <= y; endmethod Int#(32) result () if (r == 0); return product; endmethod endmodule February 15, 2008 http: //csg. csail. mit. edu/6. 375 What is the interface I_mult ? 32
- Slides: 32