Retrospective on the VIRAM1 Design Decisions Christoforos E

  • Slides: 13
Download presentation
Retrospective on the VIRAM-1 Design Decisions Christoforos E. Kozyrakis kozyraki@cs. berkeley. edu IRAM Retreat

Retrospective on the VIRAM-1 Design Decisions Christoforos E. Kozyrakis kozyraki@cs. berkeley. edu IRAM Retreat January 9, 2001

What We Probably Got Right VIRAM-1 Design Retrospective • • C. E. Kozyrakis, 1/2001

What We Probably Got Right VIRAM-1 Design Retrospective • • C. E. Kozyrakis, 1/2001 Low power design approach Use of a commercial MIPS core Permutation instructions Fixed-point arithmetic model Single load-store unit Dropping of the network interface Testing infrastructure 2

Low Power Design Approach VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Two design

Low Power Design Approach VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Two design alternatives for VIRAM-1 – 200 MHz, 2 W, 4 vector lanes – 500 MHz, 10 W (? ), 4 -8 vector lanes (? ) • Low power was the right choice because – Low power is important for embedded and multimedia applications – It is easier to design a low power processor than a high frequency one – High power consumption would severely interfere with DRAM operation 3

Use of Commercial MIPS Core VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Scalar

Use of Commercial MIPS Core VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Scalar core alternatives – Custom design optimized for a vector unit – Commercial core with generic coprocessor interface • The MIPS m 5 Kc core was a great choice because – It is a flexible, synthesizable design with a lot of documentation and support – It comes with a RTL simulation environment which we reused for VIRAM-1 – It allowed us to work on a demo system based on a MIPS daughter-card and demo board 4

Other Issues We Got Right VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Simple

Other Issues We Got Right VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Simple instructions for intra-register permutations – Allow the vectorization of reductions and FFT – Simplementation compared to a general permutation • Single load-store unit – Not sufficient memory bandwidth for two units – Address calculation and translation resources are expensive – Not obviously useful for most media applications 5

Other Issues We Got Right VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Dropping

Other Issues We Got Right VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Dropping of the network interface – Not necessary for embedded/multimedia systems – Would introduce significant design complexity • Testing infrastructure – Highly automated and easy to use for developing tests and verifying the complete VIRAM-1 design 6

What We Probably Got Wrong VIRAM-1 Design Retrospective • • C. E. Kozyrakis, 1/2001

What We Probably Got Wrong VIRAM-1 Design Retrospective • • C. E. Kozyrakis, 1/2001 Insufficient benchmarking at early project stages Support for 64 -bit data-types Lack of sub-banks in DRAM macros Dropping the decoupled pipeline Use of a crossbar for memory transfers Too much support for arithmetic exceptions Too much support for conditional execution 7

Insufficient Benchmarking VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Limited benchmarking was performed

Insufficient Benchmarking VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Limited benchmarking was performed early enough to affect major design decisions – Previous experience and intuition used in several cases • Reasons for limited benchmarking – Lack of compiler – Lack of flexible performance model – Lack of man power and time • Some of the following issues could probably be avoided if we had done more benchmarking 8

Support for 64 -bit Data Types VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 •

Support for 64 -bit Data Types VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • VIRAM-1 supports 64 -bit integer operations • Excluding encryption, few multimedia applications require 64 -bit operations • Benefits from not supporting 64 -bit operations – Large area savings from datapaths and pipeline registers – Large wiring savings from reduced width of data busses – Fewer modes to support and verify 9

Lack of DRAM Sub-banks VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • The DRAM

Lack of DRAM Sub-banks VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • The DRAM macro used has a single bank – No overlapping of accesses to different rows is allowed • Significant performance bottleneck for applications with strided or random accesses – 4 addresses per cycle for 8 banks with 5 cycles random access cycle – Bank conflicts reduce random bandwidth even further 10

Other Issues We Got Wrong VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Dropping

Other Issues We Got Wrong VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Dropping the decoupled pipeline – The “delayed pipeline” was preferred to a decoupled one due to complexity and power advantages, despite the performance issues – Due to the length of the pipeline and the lack of subbanks, it is not obvious that this was a wise decision • Use of a crossbar for memory transfers – The memory crossbar is the weakest design component in terms of scalability and flexibility – Alternative approaches (e. g. ring) were probably worth a closer examination before rejecting 11

Other Issues We Got Wrong VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Too

Other Issues We Got Wrong VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Too much support for arithmetic exceptions – VIRAM-1 includes extensive support for software speculation, user-level handlers, precise execution (slower) for arithmetic exceptions – Many of these features will never be used by the compiler, multimedia applications, or system software • Too much support for conditional execution – VIRAM-1 implements all possible alternatives for vector conditional execution (masked instructions, masked merger, scatter-gather, compress-expand) – Some of the are quite complex to implement and not obviously need for multimedia codes 12

What May Be Too Early To Call VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001

What May Be Too Early To Call VIRAM-1 Design Retrospective C. E. Kozyrakis, 1/2001 • Full-custom design of integer datapaths – Optimal area and power consumption but requires significant design time – Maybe we could use an ASIC approach based on tiling specialized macro-cells or library components • Use of two multipliers per vector lane – Most applications don’t have such a high ration of multiply or multiply-add operations – Consumes a significant amount of area 13