INTRODUCTION n n n Crusoe processor is 128

  • Slides: 18
Download presentation
INTRODUCTION n n n Crusoe processor is 128 bit microprocessor which is build for

INTRODUCTION n n n Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. • VLIW based processor and x 86 Code Morphing software provide x 86 -compatible mobile platform solution. • Processor core operates at 500 -700 MHz.

Crusoe Processor Family n n TM 5400: -500 -700 mhz. 256 k L 2

Crusoe Processor Family n n TM 5400: -500 -700 mhz. 256 k L 2 cache TM 5500: -667 -800 mhz 256 k L 2 cache TM 5600: -500 -700 mhz 512 k L 2 cache TM 5800: -667 -800 mhz 512 K L 2 cache

Multiple Issue Microprocessors n n Several Functional Units (Integer ALUs, Floating Point Unit, Load/Store…)

Multiple Issue Microprocessors n n Several Functional Units (Integer ALUs, Floating Point Unit, Load/Store…) Multiple instructions issued per cycle Requires higher memory bandwidth and more registers Two main flavors: Superscalar and VLIW.

Intel’s Superscalar Approach n n n Superscalar: Issue a variable number of instructions per

Intel’s Superscalar Approach n n n Superscalar: Issue a variable number of instructions per cycle. Pentium Pro, Pentium III are all superscalar, with a single pipeline. Processor core is RISC-based with x 86 front end.

VLIW Approach n n n Very Long Instruction Word processor Multiple FU’s, each explicitly

VLIW Approach n n n Very Long Instruction Word processor Multiple FU’s, each explicitly programmed on each instruction A Very Long Instruction Word is called a molecule Each molecule contains 4 atoms: one instruction for each FU. A molecule is either 128 bits or 64 bits wide.

Transmeta’s Crusoe Core 128 bit Molecule FADD Floating Point Unit ADD Integer ALU #0

Transmeta’s Crusoe Core 128 bit Molecule FADD Floating Point Unit ADD Integer ALU #0 LD Load/Store Unit BRCC Branch Unit

Code Morphing: Crusoe’s key n n n x 86 instructions are converted to the

Code Morphing: Crusoe’s key n n n x 86 instructions are converted to the Crusoe instruction set through a software layer During instruction translation, optimizations and scheduling tricks can be performed Crusoe Processor Architecture is decoupled from application software

Code Morphing basics x 86 Applications x 86 OS/BIOS Code Morphing Software VLIW Processor

Code Morphing basics x 86 Applications x 86 OS/BIOS Code Morphing Software VLIW Processor Core n n n Code Morphing software resides in ROM Translations are performed dynamically and are cached Successively aggressive optimizations are performed each time a block is executed

Code Translation n n Superscalar approach translates one instruction at a time Code Morphing

Code Translation n n Superscalar approach translates one instruction at a time Code Morphing examines blocks at a time, creating a translation from a block. Translations are saved in a translation cache. Successive executions of the translation invokes only the optimizer, not the translator Cost of translation is amortized over successive executions

Hardware Support for Code Morphing n n n Explicit setting of condition code All

Hardware Support for Code Morphing n n n Explicit setting of condition code All registers holding x 86 state are shadowed Commit operation copies active state to the shadow registers. “Translated bit” in page table to detect self-modifying code Alias hardware allows the ordering of load instructions ahead of store instructions

Exception Handling n n n x 86 exceptions are precise (Problematic for out-oforder execution

Exception Handling n n n x 86 exceptions are precise (Problematic for out-oforder execution of instructions) On an exception, processor state is rolled back to the most recent commit. Execution proceeds in in-order mode until the fault location is found

Long. Run: Dynamic Power Management n n n Typical Approach 1: Switch off processor

Long. Run: Dynamic Power Management n n n Typical Approach 1: Switch off processor quickly to save power (Can give glitches) Typical Approach 2: Change clock rate by suspending processor and restarting Crusoe 1: Adjust clock rate dynamically, without suspension Crusoe 2: Adjust voltage level Result: Cubic power reduction, up to 30%.

Performance of Crusoe Processor n n n n n The heatsink on the TM

Performance of Crusoe Processor n n n n n The heatsink on the TM 5400 Crusoe processor is quite small. • Execution Time – Comparable to direct hardware implementation by Intel or AMD – TM 5400 at 667 MHz is about the same as a Pentium III running at 500 MHz. • Low Cost. – Much simpler hardware. Crusoe TM 5400 is a about 7 million transistors (P 4 is at 41 Million) – Easier to design, more scalable, easier to reach high clock rate, more room for caches, better yield, etc • Low Power

Crusoe vs. PIII, heat generation Both processors playing a DVD PIII: 105. 5 C.

Crusoe vs. PIII, heat generation Both processors playing a DVD PIII: 105. 5 C. Crusoe: 48. 2

Drawbacks n n Code optimization doesn’t start until a block of code has been

Drawbacks n n Code optimization doesn’t start until a block of code has been executed more than a few times. Code translation requires clock cycles which could otherwise be used in performing application computation.

Where Transmeta could go next n n The current emphasis is on mobile computing.

Where Transmeta could go next n n The current emphasis is on mobile computing. Different applications of Code Morphing could be made to allow a different emphasis or target. Optimization techniques could be tailored to different target architectures. Workstation/Server chips were hinted at in the documentation.

Conclusions n n n Transmeta has built an x 86 Crusoe processor based on

Conclusions n n n Transmeta has built an x 86 Crusoe processor based on VLIW technology Code Morphing offers a new approach to the implementation of an instruction set architecture Crusoe offers the power of a high-performance Intel processor, consuming a fraction of the power