Superscalar Processors VLIW Processors Topics to be covered

Superscalar Processors & VLIW Processors

Topics to be covered • • • Introduction to Super scalar Processor. Architecture of Superscalar Processor. VLIW Processor. Architecture of VLIW Processor. Difference between Superscalar and VLIW processor.

• A Superscalar machine executes multiple independent instructions in parallel. • They are pipelined as well. • “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently. • The order of execution is usually assisted by the compiler.

Super pipelined Processor • In traditional pipelined system has a single pipeline stage for each sub-operation and it has to pass through a dedicated segment. • Where as A super pipelined processor has a pipeline where each of these logical steps may be subdivided into multiple pipeline stages.

Superscalar v Super-pipelined

• A more aggressive approach to achieve parallelism is to equip the processor with multiple processing units to handle several instructions in parallel in each processing stage. • Such processors are capable of achieving an instruction execution throughput of more than one instruction per cycle. These processors are known as superscalar processors.

• In superscalar processor the instruction queue has to be remain filled. • Multiple issue operation requires a wider path to the cache and multiple execution units. • Separate execution units are provided for integer and floating-point instructions.

Working Principle • The IF unit is capable of reading two instructions at a time & storing them in the instruction queue. • In each clock cycle the Dispatch unit retrieves and decodes up to two instructions from the front of the queue. • If there is one integer and one floating point instruction and no hazards, both instructions are dispatched in the same clock cycle.

• Out of order execution may lead to exception again which may cause inconsistency to the program. • Exceptions: Two types—Imprecise and Precise. • Imprecise: Let i 1 and i 2 are two instructions issued at the same time (clock cycle). i 1 causes an exception which leads the program to inconsistency situation. While i 2 has completed the WB operation. If such situation is permitted, then such type of exception is known as Imprecise Exception. • To achieve consistency in the program, writing in to the destination must be followed in the program instruction order. i. e. in order.

• Precise Exception: If an exception occurs during an instruction execution all subsequent instructions that may have been partially executed are discarded. This is called precise exception.

Execution Completion • Out of order execution is desirable to free execution unit for other instructions. • Instruction must be completed in program order to allow precise exceptions. • Both the above requirements are conflicting to each other. • The above problem can be resolved if execution is allowed to proceed but the results are written in to the temporary registers. Latter transferred in to the destination register in the correct program order. • The above step is called commitment step.

• When, out of order execution is allowed a special control unit is needed to guarantee inorder commitment. This is called commitment unit.

Dispatch Operation Should instructions be dispatched out of order? • Ensure that there is no possibility of deadlock occurring. If instructions are dispatched out of order, a deadlock can arise as follows. • Suppose that the processor has only one temporary register, and that when I 5 is dispatched , that register is reserved for it. Instruction I 4 can not be dispatched because it is waiting for the temporary register, which in turn will become free until I 5 is retired. Since I 5 can not be retired before I 4, we have a deadlock.

Issues related to Superscalar Processor • Dependent upon: Instruction level parallelism possible - Compiler based optimization - Hardware support - • Limited by – Data dependency – Procedural dependency – Resource conflicts

VLIW Processor

Basic Working Principles of VLIW • Aim at speeding up computation by exploiting instruction-level parallelism. • Same hardware core as superscalar processors, having multiple execution units (EUs) working in parallel. • An instruction is consisted of multiple operations; typical word length from 52 bits to 1 Kbits. • All operations in an instruction are executed in a lock-step mode. • Rely on compiler to find parallelism and schedule dependency free program code.

Basic VLIW Approach

Register File Structure for VLIW

Differences Between VLIW & Superscalar Architecture (I)

Differences Between VLIW & Superscalar Architecture (II) • Instruction formulation: – Superscalar: • Receive conventional instructions conceived for seq. processors. – VLIW: • Receive (very) long instruction words, each comprising a field (or opcode) for each execution unit. • Instruction word length depends (a) number of execution units, and (b) code length to control each unit (such as opcode length, register names, …). • Typical word length is 64 – 1024 bits, much longer than conventional machine word length.

• Instruction scheduling: – Superscalar: • Done dynamically at run-time by the hardware. • Data dependency is checked and resolved in hardware. • Need a look ahead hardware window for instruction fetch. – VLIW: • Static scheduling done at compile-time by the compiler. • Advantages: – Reduce hardware complexity. – Tasks such as decoding, data dependency detection, instruction issue, …, etc. becoming simple. – Potentially higher clock rate. – Higher degree of parallelism with global program information.