CPE 626 Advanced VLSI Design L 01 Department

  • Slides: 23
Download presentation
CPE 626: Advanced VLSI Design L 01 Department of Electrical and Computer Engineering University

CPE 626: Advanced VLSI Design L 01 Department of Electrical and Computer Engineering University of Alabama in Huntsville

Outline Ø Computer Engineering: Motivation, Present, Future Ø Computer Engineering Methodology Ø Power as

Outline Ø Computer Engineering: Motivation, Present, Future Ø Computer Engineering Methodology Ø Power as a Design Constraint Ø Stored-program Computer: MU 0 Example Ø Digital System Modeling: Motivation 2

Why Computer Engineering? CHANGE! It is exciting. It has never been more exciting! It

Why Computer Engineering? CHANGE! It is exciting. It has never been more exciting! It impacts every aspect of human life. PC, 2002 PDA, 2002 Eniac, 1946 (first stored-program computer) Bionic, 2002 3

Why Such Change? Ø Continuous growth in performance due to advances in technology (CMOS

Why Such Change? Ø Continuous growth in performance due to advances in technology (CMOS VLSI) and innovations in computer design (RISC, RAID, ILP) Ø Lower cost due to simpler development and higher volumes Ø These resulted in significant enhancement of the capability available to computer user § Example: our today’s PC of less than $1000 has more performance, main memory and disk storage than $1 million computer in 1970 s 4

Computer Engineering Methodology Market Implementation Complexity Evaluate Existing Systems for Bottlenecks Applications Benchmarks Technology

Computer Engineering Methodology Market Implementation Complexity Evaluate Existing Systems for Bottlenecks Applications Benchmarks Technology Trends Implement Next Generation System Simulate New Designs and Organizations Workloads 5

Technology Trends Logic Capacity Speed/Latency 4 x in 3 years 1. 54 x per

Technology Trends Logic Capacity Speed/Latency 4 x in 3 years 1. 54 x per year State of the art: Intel Pentium 4, Disk 4 x in 3 -4 years 2 x in 10 years 2. 2 GHz, 0. 13 microns, 42 million transistors Reuters, Monday 11 June 2001: Intel engineers have designed and manufactured the world’s smallest and fastest transistor of 0. 02 microns in size. DRAM 4 x in 3 -4 years 2 x in 10 years This will open the way for microprocessors of 1 billion transistors, running at 20 GHz by 2007. 6

Pentium III Die Photo Ø Ø Ø Ø Ø EBL/BBL - Bus logic, Front,

Pentium III Die Photo Ø Ø Ø Ø Ø EBL/BBL - Bus logic, Front, Back MOB - Memory Order Buffer Packed FPU - MMX Fl. Pt. (SSE) IEU - Integer Execution Unit FAU - Fl. Pt. Arithmetic Unit MIU - Memory Interface Unit DCU - Data Cache Unit PMH - Page Miss Handler DTLB - Data TLB BAC - Branch Address Calculator RAT - Register Alias Table SIMD - Packed Fl. Pt. RS - Reservation Station BTB - Branch Target Buffer IFU - Instruction Fetch Unit (+I$) ID - Instruction Decode ROB - Reorder Buffer MS - Micro-instruction Sequencer 1 st Pentium III, Katmai: 9. 5 M transistors, 12. 3 * 10. 4 mm in 0. 25 -mi. with 5 layers of aluminum 7

Pentium 4 Die Photo Ø 42 M Xtors § PIII: 26 M Ø 217

Pentium 4 Die Photo Ø 42 M Xtors § PIII: 26 M Ø 217 mm 2 § PIII: 106 mm 2 Ø L 1 Execution Cache § Buffer 12, 000 Micro-Ops Ø 8 KB data cache Ø 256 KB L 2$ 8

Future Applications Ø Desktop: 90% of cycles will be spent on media applications §

Future Applications Ø Desktop: 90% of cycles will be spent on media applications § video encode/decode, polygon & image-based graphics § audio processing, compression, music, speech recognition/synthesis § modulation/demodulation at audio and video rates Ø Scientific desktops: high-performance FPs and graphics Ø Commercial servers: support for databases and transaction processing, enhancement for reliability, support for scalability Ø Embedded computing: special support for graphics or video, power limitations 9

Future Directions Ø Conditions § new workloads are characterised with more exploitable parallelism §

Future Directions Ø Conditions § new workloads are characterised with more exploitable parallelism § dominant wire delays on a billion transistor chip will force hardware to be more distributed Ø Novel architectural techniques Develop architectural § Exploit parallelism techniques that exploit semiconductor technology o multiprocessor on chip and workload characteristics o simultaneous multithreading in order to maximize § CPU-memory integration performance at low cost o memory tolerating techniques o flexible hierarchy to adapt to application § Reconfigurable computing 10

Power as a Design Constraint Power becomes critical issue Ø Portable and mobile platforms

Power as a Design Constraint Power becomes critical issue Ø Portable and mobile platforms § battery-operated devices Ø Desktops, server farms § Reliability? § Power consumption: IT consumes 10% in the US § Power density: 30 W/cm 2 in Alpha 21364 (3 x of typical hot plate) 11

Power as a Design Constraint (cont’d) Dynamic power consumption A (activity of gates) =>

Power as a Design Constraint (cont’d) Dynamic power consumption A (activity of gates) => Turn off unused parts or use design techniques to minimize number of transitions Power due to short. Power due to circuit current leakage current during transition Reduce the supply voltage, V Reduce threshold Vt 12

Recap: Computer Architecture Ø Computer Architecture describes user’s view of the computer: visible registers,

Recap: Computer Architecture Ø Computer Architecture describes user’s view of the computer: visible registers, data types, instruction set, instruction formats, memory management table structures, exception handling Ø Computer Organization describes user’s invisible implementation of the architecture: pipeline structure, caches, TLB, . . . 13

Stored-program computer 14

Stored-program computer 14

Typical Hierarchy Ø Transistors Ø Logic gates, memory cells, special circuits Ø Single-bit adders,

Typical Hierarchy Ø Transistors Ø Logic gates, memory cells, special circuits Ø Single-bit adders, MUXs, flip-flops, decoders, coders Ø Word-wide adders, MUXs, registers, decoders, buses Ø ALUs, shifters, register files, memory blocks Ø Processor, peripheral cells, cache memories, MMUs Ø Integrated system chips Ø PCBs Ø Mobile phones, laptops, PCs, engine controllers Vdd A A. B B Vss 15

MU 0 – A Simple Processor Ø Instruction format Ø Instruction set 16

MU 0 – A Simple Processor Ø Instruction format Ø Instruction set 16

MU 0 Datapath Example Ø Program Counter – PC Ø Accumulator - ACC Ø

MU 0 Datapath Example Ø Program Counter – PC Ø Accumulator - ACC Ø Arithmetic-Logic Unit – ALU Ø Instruction Register Ø Instruction Decode and Control Logic Follow the principle that the memory will be limiting factor in design: each instruction takes exactly the number of clock cycles defined by the number of memory accesses it must take. 17

MU 0 Datapath Design Ø Assume that each instruction starts Ø Initialization when it

MU 0 Datapath Design Ø Assume that each instruction starts Ø Initialization when it has arrived in the IR § Reset input to start Ø Step 1: EX (execute) executing instructions from § LDA S: ACC <- Mem[S] a known address; here it is § STO S: Mem[S] <- ACC 000 hex § ADD S: ACC <- ACC + Mem[S] o provide zero at the ALU § SUB S: ACC <- ACC - Mem[S] output and then load it § JMP S: PC <- S into the PC register § JGE S: if (ACC >= 0) PC <- S § JNE S: if (ACC != 0) PC <- S Ø Step 2: IF (fetch the next instruction) § Either PC or the address in the IR is issued to fetch the next instruction § address is incremented in the ALU and value saved into the PC 18

MU 0 RTL Organization Ø Control Logic § Asel § Bsel § ACCce (ACC

MU 0 RTL Organization Ø Control Logic § Asel § Bsel § ACCce (ACC change enable) § PCce (PC change enable) § IRce (IR change enable) § ACCoe (ACC output enable) § ALUfs (ALU function select) § MEMrq (memory request) § Rn. W (read/write) § Ex/ft (execute/fetch) 19

MU 0 control logic 20

MU 0 control logic 20

MU 0 ALU Design Ø ALU functions: A+B, A-B, B, B+1, 0 (used only

MU 0 ALU Design Ø ALU functions: A+B, A-B, B, B+1, 0 (used only when reset is active) => 4 functions Ø Aen (enable operand A) Ø Binv (invert operand B) 21

Digital System Modeling: Motivation Ø Ø Ø Requirements specification Functional specification Testing and verification

Digital System Modeling: Motivation Ø Ø Ø Requirements specification Functional specification Testing and verification of the design Formal verification of the correctness of the design Automatic synthesis 22

Gajski and Kuhn’s Y Chart Architectural Behavioral Structural Algorithmic Systems Functional Block Processor Hardware

Gajski and Kuhn’s Y Chart Architectural Behavioral Structural Algorithmic Systems Functional Block Processor Hardware Modules Algorithms Logic ALUs, Registers Register Transfer Gates, FFs Circuit Logic Transistors Transfer Functions Rectangles Cell, Module Plans Floor Plans Domains Clusters Functional – operations performed by the system Physical Partitions Structural – how the system is composed Geometry – how the system is laid out in physical space Physical/Geometry 23