Si Ge HBT Bi CMOS Field Programmable Gate

Si. Ge HBT Bi. CMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing Bryan S. Goda Rensselaer Polytechnic Institute Troy, New York

Agenda • Introduction • Bi. CMOS FPGA History • Si. Ge HBT Bi. CMOS Process • Current Mode Logic • Xilinx 6200 FPGA Design • Configuration Memory • Performance Results • Conclusions and Future Work

Current Role of Si. Ge • “More Zip per Chip” • Wireless Phones -> Watch Sized Phone • Direct Broadcast Satellite • Fiber-Optic Lines, Switches, and Routers

Programmable Bipolar Logic 1983: Fairchild ECL Field Programmable Logic Array • Fuse Based • 4 ns Cycle Rate • High Power • Scaling Problems 1990: Algotronix 1. 2 u. M 256 Cell Configurable Logic Array • f. T 6 GHz, 200 ps Gate Delay • 4 Transistor Static RAM Memory Cells • ASIC Emulation and Signal Processing • Forerunner of XC 6200

US Patent CMOS Switchable 2 Input Multiplexer V+ Y 1 Y 2 a a Vref EN 1 EN 2 V- Y 2

Si. Ge Heterojunction Bipolar Transistor • Selectively introduce Ge into the base of a Si BJT • Smaller Base Bandgap increases einjection, higher Beta (100) • Higher Beta allows more heavily doped base RB (125 Ohm) • Graded Bandgap decrease base transit time f. T


Si. Ge HBT • 50 Ghz Process, 100 Ghz process within a year (30 u. A at 50 Ghz) • 5 layers of metal • Used in RPI VLSI Class • co-integrated with CMOS process – can have HBT logic with CMOS memory – low power and high speed

f. T Curves for Various Emitter Lengths

Emitter Si. Ge HBT Layout Base Collector Sub-Collector

Band Diagram Eg, Ge(x=0) Eg, Ge(grade)= Eg, Ge(x=Wb)- Eg, Ge(x=0) =0. 031 ev e- EC EV n+ Si emitter p-Si. Ge base h+ Ge p-Si Drift Field n- Si collector Dielectric Constant Si = 11. 7 Ge =16. 2 Si. Ge (7. 5% Ge)=12. 03

CML Branch Current vs. Differential DC Voltage

IBM Si. Ge and CMOS Load Gate Delays on M 1, M 2, LM

Current Steering Logic Vcc 0 V Level 1 -250 m. V Fastest Logic Level Limited Drive Capability -950 m. V Level 2 Inter-block Signal Level Good Fan-Out (10) -1. 2 V -1. 90 V Level 3 -2. 15 V Vee 4. 5 V Clock Signal Slowest Level 4 Possible

Current Steering Logic In Si. Ge • 13 ps Transistor Switching Time (75 Ghz) – 6 ps Process Next Year • Small Voltage Swings (250 mv) vs 3. 3 or 5 V – Less Power – Smaller Swing = Faster • “Steer” Currents, Use Differential Logic – Less Switch Noise • Less Transistors needed, Complement Signal Present • Flip-Flops and Multiplexers Easy to Implement

Vcc O V CML XOR Logic Schematic A 0 0 1 1 1 B 0 1 0 A XOR B 0 1 1 Level 1 0 -0. 25 V A A XOR B A 0 1 1 A 0 1 B B A level 1 Level 2 -0. 95 -1. 2 V B level 2 0 0 1 A XOR B 0 0 1 1 1 0 0 1 1 Vref 0 1 0 Vee -4. 5 V

General FPGA Structure Logic Cell I/O Cell Routing Network Configuration Memory

High Speed FPGA Applications • Real Time Image Processing - Radar - Pattern Recognition • Digital Networks - Mobile Subscriber Equipment - Command Information Systems - High Speed Switching Nodes • Control Systems - Guidance Systems - Reprogrammable Survivability • Satellite Systems

Image Correlation Search Image Desired Image 1. 2. 3. 4. Desired Image is programmed into chip (1 pixel = 1 CLB) Load a section of search image If enough pixels match, then turn found bit on Load another section, or reprogram with new desired image

Samples From XC 6200 CAD Tools IO Blocks CLBs Pins

FPGA Drawbacks • Slowdown – 200 Mhz Internal Speed down to 30 -60 MHz External – Pass Transistor = Low Pass Filter • Limited Bandwidth • Relatively Long Configuration Times (Seconds) • Vender Guarded Information • More Expensive than Comparable ASIC

Pass Transistor Interconnect Modeling 3 1 3 2 M 1 M M 1 2 3 4 On M 4 2 M M 4 (Memory) Interconnect Pass Transistor Equivalent Circuit from Node 3 to Node 2

Field Programmable Gate Arrays (FPGA) • Hierarchy Level Organization (Sea of Gates) – Simple Cells (Configurable Logic Blocks) – 4 x 4, 16 x 16, 64 x 64 groupings – Hierarchy of routing resources at each level – I/O Blocks (external interface)

Design Parameters • Logic Swings Levels - Based on Differential Pair Switching - Current Levels • Redesign of the Configurable Logic Block - Take Advantage of Differential Wiring - What Parts Can be Turned off if not Used? • Supply Levels - How Many Levels of Logic? • Routing Resources • CMOS Voltage Levels - Integrate CMOS into Bipolar Current Tree

Current Tree with CMOS Routing VCC 0 V OUT Level 1 0 -0. 25 V OUT a a b b S 1 Level 2 -0. 95 -1. 2 V Level 3 -1. 9 -2. 15 V c c d S 1 S 2 Vref Replace with Vee -3. 4 V d

Bipolar vs Bipolar/CMOS Current Trees CMOS Bipolar Pulse Width 50 ps 60 ps 70 ps 100 ps

4: 1 Multiplexer Level 1 Inputs Level 1 Output Level 2 Input Level 3 Input CMOS Version W/L 5: 1

X 1: = a Sample Logic Using Multiplexers A and B X 2: = b Y 2 1 0 X 3: = a Y 3 If a=1 then select Y 2 output = b If a=0 then select Y 3 output = 0 X 1: = a A OR B X 2: = a Y 2 1 0 X 3: = b Y 3 If a=1 then select Y 2 output = 1 If a=0 then select Y 3 output = b

X 1: = a X 2: = b Redesign of XC 6200 Logic Original XC 6200 Design • Have to Track Inversions Y 2 1 Inverted Output 0 X 3: = a Y 3 X 1: = a X 2: =b Y 2 1 0 X 3: = a Y 3 Revised Design • Use Differential Pair Logic • Eliminate XC 6200 Fast Logic • No Inversion Tracking Non-Inverted Output

X 2 Y 2 X 1 1 0 CS Multiplexer RP Multiplexer C D Q X 3 S F Original XC 6200 Architecture Y 3 Clk Q Clr X 2 Y 2 X 1 1 0 CS Multiplexer RP Multiplexer C D Q X 3 Y 3 Clk Clr S F Redesigned Architecture Bipolar with CMOS Routing Q Switchable

10 Ghz Three CLB Simulation

CLB Layout 4: 1 Mux (off switchable) CMOS Control 4: 1 Mux High Speed Logic Master/Slave Latch (off switchable) 2: 1 Mux CMOS Control (off switchable) Buffer

Sample CLB Test Circuit Vref 8: 1 Mux CLB Vref Buffer 8/1 Divide Pad Drivers

Actual Fabricated Test Circuit Pads (110 u x 110 u)

Outgoing CLB Routing Incoming CLB Routing N S E W N 4 S 4 E 4 W 4 X 3 N S E W N 4 S 4 E 4 W 4 X 1 X 2 CLB F N S E W N 4 S 4 E 4 W 4

4 x 4 Block Boundary Routing N Switches S Switches Local Routing Magic Routing W Switches E Switches W Switches N Switches S Switches Length 4 Fast. Lane (4 x 4) Length 16 Fastlane (16 x 16) Chip Length Fastlane (64 x 64)

Wout Local CLB Routing Nout N E W F N S E W N 4 S 4 E 4 W 4 X 1 • Nearest Neighbor Routing • Output (F) or Local Through N S E W N 4 S 4 E 4 W 4 X 2 CLB F S EW F Sout Example: Route East Signal Through to Next CLB Note: Can’t Route Signal Back to Origin at this Level Eout X 3 N S E F N S W F N S E W N 4 S 4 E 4 W 4

Normal CMOS Memory-CML Interface New Configuration Data SRAM Bits In Memory Planes VSS CMOS to CML Buffer V SS V REF decode VEE V EE CLB Multiplexer Inputs

Memory Design Q D Latch M/S 40 Transistors CLK D Clock Q D Latch M/S Q 18 Transistors CLK Data Word Out RAM Cell 6 Transistors Parallel Load

3 -D Chip Stacking Memory Planes CLBs • Shorter Wires • More CLBs/Area • Optimize Memory

CLB Select CLB with Routing and RAM (2) RAM 2 CLB RAM 1 MUX MUX Selects

Layout of Configurable Logic Block with 2 sets of RAM 2: 1 Mux Circuit Elements: 240 nfets 122 pfets 36 resistors 98 npn 1 HBTs 16 npnhb 1 HBTs 8: 1 Mux (routing) CMOS Selects CLB (logic) Master/Slave Latch (memory)

Si. Ge Performance Circuit Type Propagation Delay Buffer 17 ps CML MUX XOR, AND, OR 22 -25 ps 23 -26 ps CLB 100 ps Power Decreasing Ideas Date Dec 98 June 99 Aug 99 Dec 99 Mar 00 Dec 00* Idea Power Consumption/CLB Original CLB 73 m. W CLB Redesign I 34 m. W CLB Redesign II 24 m. W Widlar Current Mirror with CMOS Control, CMOS Routing 10. 8 m. W Supply Voltage 4. 5 -> 3. 3 V 7 m. W 7 HP Process 0. 3 m. W * Projected Power Levels for 7 HP Process: At 50 Ghz, 30 u. A, 20 x+ reduction in power

Multiplexer Performance vs Temperature Normal 250 m. V Swing 200 m. V Min Swing

Vcc Input Vref Vee Widlar Current Mirror with CMOS Control

XC 6200 Design Improvements • Developed at the University of Scotland • Inversion of Signal at Every CLB - Taken care of due to differential pair wiring • No Pass Transistors, Use Multiplexers for Routing • Able to turn off unused parts with CMOS controlled current mirror • No CMOS-CML Conversion circuits needed, CMOS in current trees • Handcrafted, dense layouts • Context Switching

Power Delay Product u. W/gate/Mhz (log scale) 1 5 HP PDP CMOS High 0. 1 PDP CMOS Low PDP Bi. CMOS 7 HP 0. 01 0. 001 1998 8 HP 1999 2000 Year 2001 2002

Data Dependent Switching Differential Logic has Complement Switching In Opposite Direction A A B B C C Slow Transition Bit Line Twisting Could Vary Signals Up to 30% A A B B C C Fast Transition Setup Time Violations

Future Work • Testing • Overall FPGA Architecture • Scaling • Integrate with Other Systems • Projected Graduation May 2001, work to continue at USMA • Power Reduction - 7 HP Process

CLB Context Switch Example Pattern 1 0001100100 70 ps ~ 7. 1 GHz Pattern 2 1011011100 70 ps Select 0001100100 1011011100 000100 AND 1011111100 OR AND OR

Redesigned CLB Cell with Routing and Memory (2 x) 2 x 24 Bit RAM Three 8 -1 Input Mux CLB Four 4 -1 Output Mux M 1 M 2 M 3 M 4

CLB Row 4 x 1 Memory Bus Lines Circuit Elements Switch 1520 Nfets 792 Pfets 260 Resistors 140 NPN 1 HB 576 NPN 1 N/S Input Output

XC 6200 Device Family Device XC 6209 XC 6216 XC 6236 XC 6264 Gate Count 9 -13 K 16 -24 K 36 -55 K 64 -100 K Number Cells 2304 4096 9216 16384 I/O Blocks 192 256 384 512 Row x Col 48 x 48 64 x 64 96 x 96 128 x 128

Typical Routing Delays Symbol Parameter XC 6200 Si. Ge Redesign TNN Route Nearest Neighbor 1 ns 23 ps Tmagic Route X 2/X 3 to Magic Out 1. 5 ns 47 ps TL 4 Length 4 Fast. Lane 1. 5 ns 47 ps TL 16 Length 16 Fast. Lane 2 ns 70 ps TCL 64 Chip-Length (64) Delay 3 ns 94 ps ~31 x improvement

4 x 4 CLB Layout Cell • Largest Basic Block • Over 13, 000 Transistors • Commercial Product Size is a 4 x 4 Array of this Cell




5 Stage Ring Oscillator Speed Relative to Schematic Current Schematic 6. 36 Ghz -- 8. 4 m. A Parasitics 5. 71 Ghz 89% 8. 6 m. A 50 o. C 5. 26 Ghz 82% 8. 85 m. A 75 o. C 4. 87 Ghz 76% 9. 1 m. A 100 o. C 4. 16 Ghz 65% 9. 34 m. A 125 o. C 3. 12 Ghz 49% 9. 5 m. A


- Slides: 61