ESE 534 Computer Organization Day 23 April 21

  • Slides: 53
Download presentation
ESE 534: Computer Organization Day 23: April 21, 2014 Control Penn ESE 534 Spring

ESE 534: Computer Organization Day 23: April 21, 2014 Control Penn ESE 534 Spring 2014 -- De. Hon 1

Previously • Looked broadly at instruction effects • Explored structural components of computation –

Previously • Looked broadly at instruction effects • Explored structural components of computation – Interconnect, compute, retiming • Explored operator sharing/time-multiplexing • Explored branching for code compactness Penn ESE 534 Spring 2014 -- De. Hon 2

Today • Instantaneous compute requirement vs. total compute requirement • Control – data-dependent operations

Today • Instantaneous compute requirement vs. total compute requirement • Control – data-dependent operations • Different forms – local – instruction selection • Control granularity – architecture space parameter Penn ESE 534 Spring 2014 -- De. Hon 3

Control • Control: That point where the data affects the instruction stream (operation selection)

Control • Control: That point where the data affects the instruction stream (operation selection) – Typical manifestation • data dependent branching – if (a!=0) Op. A else Op. B – bne • data dependent state transitions – new => goto S 0 – else => stay • data dependent operation selection Penn ESE 534 Spring 2014 -- De. Hon +/- addp 4

Control • Viewpoint: can have instruction stream sequence without control – I. e. static/data-independent

Control • Viewpoint: can have instruction stream sequence without control – I. e. static/data-independent progression through sequence of instructions is control free • C 0 C 1 C 2 C 0 … – Similarly, FSM w/ no data inputs Penn ESE 534 Spring 2014 -- De. Hon 5

Day 9 Programmable Architecture Penn ESE 534 Spring 2014 -- De. Hon 6

Day 9 Programmable Architecture Penn ESE 534 Spring 2014 -- De. Hon 6

Terminology (reminder) • Primitive Instruction (pinst) – Collection of bits which tell a bit-processing

Terminology (reminder) • Primitive Instruction (pinst) – Collection of bits which tell a bit-processing element what to do – Includes: • select compute operation • input sources in space (interconnect) • input sources in time (retiming) • Configuration Context – Collection of all bits (pinsts) which describe machine’s behavior on one cycle 7 Penn ESE 534 Spring 2014 -- De. Hon

Back to “Any” Computation • Design must handle all potential inputs (computing scenarios) •

Back to “Any” Computation • Design must handle all potential inputs (computing scenarios) • Requires sufficient generality • However, computation for any given input may be much smaller than general case. Instantaneous compute << potential compute Penn ESE 534 Spring 2014 -- De. Hon 8

Preclass 1 if ((dx*dx+dy*dy)>threshold) z=cx*dx+cy*dy else z=dx*dy+c 3 • How many operations performed? •

Preclass 1 if ((dx*dx+dy*dy)>threshold) z=cx*dx+cy*dy else z=dx*dy+c 3 • How many operations performed? • Cycles? • Compute Blocks needed? Penn ESE 534 Spring 2014 -- De. Hon 9

Preclass 1 • Delay? • Operations? Penn ESE 534 Spring 2014 -- De. Hon

Preclass 1 • Delay? • Operations? Penn ESE 534 Spring 2014 -- De. Hon 10

Preclass • Operations? • Cycles? • How did it do that? (reduce delay) Penn

Preclass • Operations? • Cycles? • How did it do that? (reduce delay) Penn ESE 534 Spring 2014 -- De. Hon 11

If-Conversion Penn ESE 534 Spring 2014 -- De. Hon 12

If-Conversion Penn ESE 534 Spring 2014 -- De. Hon 12

If-Conversion • Trade-off: – Latency – Work Penn ESE 534 Spring 2014 -- De.

If-Conversion • Trade-off: – Latency – Work Penn ESE 534 Spring 2014 -- De. Hon 13

Day 3 Idea • Compute both possible values and select correct result when we

Day 3 Idea • Compute both possible values and select correct result when we know the answer Penn ESE 534 Spring 2014 -- De. Hon 14

If-Conversion ~= Predicated Operations P 1=P() P 2=!P 1 G() P 2 H() Penn

If-Conversion ~= Predicated Operations P 1=P() P 2=!P 1 G() P 2 H() Penn ESE 534 Spring 2014 -- De. Hon P 1=t 1>t 2 P 2=!P 1 … P 1 z=t 4 P 2 z=t 5 Why important? 15

Pipelining Processor • What happens if we pipeline between – Instruction Memory and –

Pipelining Processor • What happens if we pipeline between – Instruction Memory and – datapath Penn ESE 534 Spring 2014 -- De. Hon 16

Pipelined Branching Processor • What happens here if we pipeline between Instruction-memory and datapath

Pipelined Branching Processor • What happens here if we pipeline between Instruction-memory and datapath Penn ESE 534 Spring 2014 -- De. Hon 17

Instruction Control Latency • For time-multiplexed (data-independent) sequencing – Can pipeline instruction distribution –

Instruction Control Latency • For time-multiplexed (data-independent) sequencing – Can pipeline instruction distribution – Instruction memory read • With data-dependent branching: decision PC distribution read latency becomes part of critical path Penn ESE 534 Spring 2014 -- De. Hon 18

Clock Cycle Radius • Radius of logic can reach in one cycle (45 nm)

Clock Cycle Radius • Radius of logic can reach in one cycle (45 nm) – Radius 10 • Few hundred PEs – Chip side 600 -700 PE • 400 -500 thousand PEs – 100 s of cycles to cross Penn ESE 534 Spring 2014 -- De. Hon 19

Two Control Options 1. Local control – unify choices • build all options into

Two Control Options 1. Local control – unify choices • build all options into spatial compute structure and select operation Mux-conversion 2. Instruction selection – provide a different instruction (instruction sequence) for each option – selection occurs when chose which instruction(s) to issue Penn ESE 534 Spring 2014 -- De. Hon 20

Two Control Options 1. Local control 2. Instruction selection May use both within an

Two Control Options 1. Local control 2. Instruction selection May use both within an application – Local control in critical path, inner-loops, where latency rather than parallelism limited Instruction-selection coarse-grain selection – • • At coarse level Or where have plenty of task parallelism so latency not limit computation Penn ESE 534 Spring 2014 -- De. Hon 21

Video Decoder • E. g. Video decoder [frame rate = 33 ms] – if

Video Decoder • E. g. Video decoder [frame rate = 33 ms] – if (packet==FRAME) • if (type==I-FRAME) – I-FRAME computation • else if (type==B-FRAME) – B-FRAME computation • Millions of cycles per frame – Instruction control between frames • Local control within frames Penn ESE 534 Spring 2014 -- De. Hon 22

Packet Processing • If IP-V 6 packet – …. • If IP-V 4 packet

Packet Processing • If IP-V 6 packet – …. • If IP-V 4 packet –… • If Voi. P packet –… • If modem packet – …. Penn ESE 534 Spring 2014 -- De. Hon 23

Inclass 4(a) • Local or instruction issue control? • Optimize average runtime on 2

Inclass 4(a) • Local or instruction issue control? • Optimize average runtime on 2 -issue VLIW if (odd(a)) return(factor(a)); else return(2); Penn ESE 534 Spring 2014 -- De. Hon 24

Inclass 4(b) • Local or instruction issue control? • Optimize completion on 4 -issue

Inclass 4(b) • Local or instruction issue control? • Optimize completion on 4 -issue VLIW while (abs(f(xm)-y)>delta) if (((f(xh)>y) && (f(xm)<y)) || ((f(xh)<y) && (f(xm)>y))) xl=xm; xm=(xh+xm)/2; else xh=xm; xm=(xl+xm)/2; return(xm); Penn ESE 534 Spring 2014 -- De. Hon 25

Control Granularity Architectural Parameter(s) For Instruction Selection Penn ESE 534 Spring 2014 -- De.

Control Granularity Architectural Parameter(s) For Instruction Selection Penn ESE 534 Spring 2014 -- De. Hon 26

Inclass 5 WAIT: if (in. type==header) cnt=in. header_payload_size; checksum=0; goto RECEIVE; else goto WAIT;

Inclass 5 WAIT: if (in. type==header) cnt=in. header_payload_size; checksum=0; goto RECEIVE; else goto WAIT; RECEIVE: checksum=checksum xor in; data[packet][cnt]=in; cnt--; if (cnt==0) goto CHECK; else goto RECEIVE; CHECK: if (in==checksum) packet++; goto WAIT Penn ESE 534 Spring 2014 -- De. Hon 27

Inclass 5 • Preferred Architecture? Penn ESE 534 Spring 2014 -- De. Hon 28

Inclass 5 • Preferred Architecture? Penn ESE 534 Spring 2014 -- De. Hon 28

Inclass 5 • How support – Two ports – On two 2 -issue VLIWs

Inclass 5 • How support – Two ports – On two 2 -issue VLIWs – with separate controllers? Penn ESE 534 Spring 2014 -- De. Hon WAIT: if (in. type==header) cnt=in. header_payload_size; checksum=0; goto RECEIVE; else goto WAIT; RECEIVE: checksum=checksum xor in; data[packet][cnt]=in; cnt--; if (cnt==0) goto CHECK; else goto RECEIVE; CHECK: if (in==checksum) packet++; goto WAIT 29

Inclass 5 • How support – two ports – on one 3 -issue VLIW

Inclass 5 • How support – two ports – on one 3 -issue VLIW – with single controller? – Instructions? – PC bits? Penn ESE 534 Spring 2014 -- De. Hon WAIT: if (in. type==header) cnt=in. header_payload_size; checksum=0; goto RECEIVE; else goto WAIT; RECEIVE: checksum=checksum xor in; data[packet][cnt]=in; cnt--; if (cnt==0) goto CHECK; else goto RECEIVE; CHECK: if (in==checksum) packet++; goto WAIT 30

Instruction Control • If FSMs (ports) advance orthogonally – (really independent control) – context

Instruction Control • If FSMs (ports) advance orthogonally – (really independent control) – context depth => product of states • Product of PCs – I. e. with single controller (PC) • must create product FSM • which may lead to state explosion – N FSMs, with S states => SN product states Penn ESE 534 Spring 2014 -- De. Hon 31

Day 10 Architectural Differences • What differentiates a VLIW from a multicore? Penn ESE

Day 10 Architectural Differences • What differentiates a VLIW from a multicore? Penn ESE 534 Spring 2014 -- De. Hon 32

Architectural Questions • How many pinsts/controller? Penn ESE 534 Spring 2014 -- De. Hon

Architectural Questions • How many pinsts/controller? Penn ESE 534 Spring 2014 -- De. Hon 33

Day 11 Architecture Taxonomy PCs Pints/PC depth width Architecture 0 N 1 1 FPGA

Day 11 Architecture Taxonomy PCs Pints/PC depth width Architecture 0 N 1 1 FPGA 1 N (48, 640) 8 1 Tabula ABAX (A 1 EC 04) 1 1 1024 32 Scalar Processor (RISC) 1 N D W VLIW (superscalar) 1 1 Small W*N SIMD, GPU, Vector N 1 D W MIMD 16 1 2048 64 16 -core (4? ) Penn ESE 534 Spring 2014 -- De. Hon 34

Architectural Questions • How many pinsts/controller? • Fixed or Configurable assignment of controllers to

Architectural Questions • How many pinsts/controller? • Fixed or Configurable assignment of controllers to pinsts? – …what level of granularity? Penn ESE 534 Spring 2014 -- De. Hon 35

Architectural Questions • Effects of: – Too many controllers? – Too few controllers? –

Architectural Questions • Effects of: – Too many controllers? – Too few controllers? – Fixed controller assignment? – Configurable controller assignment? Penn ESE 534 Spring 2014 -- De. Hon 36

Architectural Questions • Too many: – wasted space on extra controllers – synchronization? •

Architectural Questions • Too many: – wasted space on extra controllers – synchronization? • Too few: – product state space and/or underuse logic • Fixed: – underuse logic if when region too big • Configurable: – cost interconnect, slower distribution Penn ESE 534 Spring 2014 -- De. Hon 37

FSM Control Factoring Case Study Penn ESE 534 Spring 2014 -- De. Hon 38

FSM Control Factoring Case Study Penn ESE 534 Spring 2014 -- De. Hon 38

FSM Example (local control) 4 4 -LUTs 2 LUT Delays Penn ESE 534 Spring

FSM Example (local control) 4 4 -LUTs 2 LUT Delays Penn ESE 534 Spring 2014 -- De. Hon 39

FSM Example 3 4 -LUTs 1 LUT Delay Penn ESE 534 Spring 2014 --

FSM Example 3 4 -LUTs 1 LUT Delay Penn ESE 534 Spring 2014 -- De. Hon 40

Local Control • LUTs used LUT evaluations produced • Counting LUTs not tell cycle-by-cycle

Local Control • LUTs used LUT evaluations produced • Counting LUTs not tell cycle-by-cycle LUT needs Penn ESE 534 Spring 2014 -- De. Hon 41

FSM Example (Instruction) 3 4 -LUTs 1 LUT Delay Penn ESE 534 Spring 2014

FSM Example (Instruction) 3 4 -LUTs 1 LUT Delay Penn ESE 534 Spring 2014 -- De. Hon 42

FSM Example • FSM -- canonical “control” structure – captures many of these properties

FSM Example • FSM -- canonical “control” structure – captures many of these properties – can implement with deep multicontext • instruction selection – can implement as multilevel logic • unify, use local control • Serve to build intuition Penn ESE 534 Spring 2014 -- De. Hon 43

Partitioning versus Contexts (Area) CSE benchmark Penn ESE 534 Spring 2014 -- De. Hon

Partitioning versus Contexts (Area) CSE benchmark Penn ESE 534 Spring 2014 -- De. Hon 44

Partitioning versus Contexts (Heuristic) • Start with dense mustang state encodings • Greedily pick

Partitioning versus Contexts (Heuristic) • Start with dense mustang state encodings • Greedily pick state bit that produces – least greatest area split – least greatest delay split • Repeat until have desired number of contexts Penn ESE 534 Spring 2014 -- De. Hon 45

Partition to Fixed Number of Contexts Penn ESE 534 Spring 2014 -- De. Hon

Partition to Fixed Number of Contexts Penn ESE 534 Spring 2014 -- De. Hon 46

Extend Comparison to Memory • Fully local => compute with LUTs • Fully partitioned

Extend Comparison to Memory • Fully local => compute with LUTs • Fully partitioned => lookup logic (context) in memory and compute logic • How compare to fully memory? – Simply lookup result in table? Penn ESE 534 Spring 2014 -- De. Hon 47

Memory FSM Compare (small) Penn ESE 534 Spring 2014 -- De. Hon 48

Memory FSM Compare (small) Penn ESE 534 Spring 2014 -- De. Hon 48

Memory FSM Compare (large) Penn ESE 534 Spring 2014 -- De. Hon 49

Memory FSM Compare (large) Penn ESE 534 Spring 2014 -- De. Hon 49

Memory FSM Compare (notes) • Memory selected was “optimally” sized to problem – in

Memory FSM Compare (notes) • Memory selected was “optimally” sized to problem – in practice, not get to pick memory allocation/organization for each FSM – no interconnect charged • Memory operate in single cycle – but cycle slowing with inputs • Smaller for <11 state+input bits • Memory size not affected by CAD quality (FPGA/DPGA is) Penn ESE 534 Spring 2014 -- De. Hon 50

Big Ideas [MSB Ideas] • Control: where data effects instructions (operation) • Two forms:

Big Ideas [MSB Ideas] • Control: where data effects instructions (operation) • Two forms: – local control • all ops resident fast selection – instruction selection • may allow us to reduce instantaneous work requirements • introduce issues – depth, granularity, instruction load and select time Penn ESE 534 Spring 2014 -- De. Hon 51

Big Ideas [MSB-1 Ideas] • If-Conversion – Latency vs. work tradeoff • Intuition explored

Big Ideas [MSB-1 Ideas] • If-Conversion – Latency vs. work tradeoff • Intuition explored canonical FSM case – few context can reduce LUT requirements considerably (factor dissimilar logic) – similar logic more efficient in local control – overall, moderate contexts (e. g. 8) • exploits both properties … better than extremes Penn ESE 534 Spring 2014 -- De. Hon 52

Admin • Grading: HW 7 done • FM 1 due Wednesday • Office Hours

Admin • Grading: HW 7 done • FM 1 due Wednesday • Office Hours on Tuesday 3: 30 -4: 30 pm – Shifting up, won’t be there past 4: 30 pm • Reading for Wednesday on canvas Penn ESE 534 Spring 2014 -- De. Hon 53