TK 6123 COMPUTER ORGANISATION ARCHITECTURE Lecture 7 CPU

Contents This lecture will discuss: • Memory Unit. • Instruction Execution. • Buses. Prepared

Memory Implementations The most common types of memory: • • magnetic core memory static

Memory Implementations Magnetic core memory uses a small core of magnetic material to hold

Memory Implementations Most current computers use either static or dynamic RAM for memory. Dynamic

Memory Implementations Static RAM (SRAM) does not require refreshing. Static RAM is also faster

Memory Implementations ROM (read-only memory) is used for situations where the software is built

Memory Implementations Within the computer, ROM is both nonvolatile and unwriteable. The method used

Memory Implementations EEPROM and Flash ROM are recent memory innovations that implement nonvolatile, writeable

Primary Memory: Memory Addresses (1) Three ways of organizing a 96 -bit memory. Prepared

Instruction Cycle Two steps: • Fetch • Execute Prepared by: Dr Masri Ayob 11

Fetch/Execute Cycle Program Counter (PC) holds address of next instruction to fetch. PC MAR

Fetch/Execute Cycle Instruction loaded into Instruction Register (IR). MDR IR. Processor interprets instruction and

Execute Cycle Processor-memory • data transfer between CPU and main memory Processor I/O •

Design Principles for Modern Computers All instructions directly executed by hardware. Instructions should be

Design Principles for Modern Computers Maximise rate at which instructions are issued: • Two

SEPARATE FETCH UNIT/EXECUTE UNIT To achieve maximum performance, these two parts operate as independently

SEPARATE FETCH UNIT/EXECUTE UNIT Once an instruction is fetched, it is held in a

SEPARATE FETCH UNIT/EXECUTE UNIT As instructions are executed, the fetch unit takes advantage of

SEPARATE FETCH UNIT/EXECUTE UNIT The execution unit contains the ALU and the portion of

Pipelining Observe that the limitation to performance results from the serial nature of CPU

Pipelining Most instructions require many steps (clock cycles) to fetch/execute the instruction. Common instructions

Instruction-Level Parallelism A five-stage pipeline The state of each stage as a function of

Pipelining Problem: a branch instruction may invalidate all the instructions in the pipeline at

Superpipelined Many pipeline stages need less than half a clock cycle. Double internal clock

Scalar and Superscalar Processor Organisation It is not useful to pipe different types of

Scalar and Superscalar Processor Organisation With multiple execution units, it is possible to process

Superscalar Architectures (1) Dual five-stage pipelines with a common instruction fetch unit. Prepared by:

Superscalar Architectures (2) A superscalar processor with five functional units. Prepared by: Dr Masri

Limitations Technical issues that must be resolved to make it possible to execute multiple

True Data Dependency ADD r 1, r 2 (r 1 : = r 1+r

Procedural Dependency Conditional branch instructions may depend on the results from instructions that have

Resource Conflict Two or more instructions requiring access to the same resource at the

In-Order Issue Out-of-Order Completion Output dependency • • • R 3: = R 3

Antidependency Write-write dependency • • • R 3: =R 3 + R 5; (I

Design Issues Instruction level parallelism • Instructions in a sequence are independent • Execution

Instruction Issue Policy Order in which instructions are fetched. Order in which instructions are

Processor-Level Parallelism (1) An array of processor of the ILLIAC IV type. Prepared by:

Processor-Level Parallelism (2) A single-bus multiprocessor. A multicomputer with local memories. Prepared by: Dr

Buses There a number of possible interconnection systems Single and multiple BUS structures are

What is a Bus? A communication pathway connecting two or more devices. Is a

Data Bus Carries data • Remember that there is no difference between “data” and

Address bus Identify the source or destination of data e. g. CPU needs to

Control Bus Control and timing information • provide control for the proper synchronisation and

Bus Interconnection Scheme Prepared by: Dr Masri Ayob 45

Big and Yellow? What do buses look like? • Parallel lines on circuit boards

Physical Realisation of Bus Architecture Prepared by: Dr Masri Ayob 47

Buses may connect modules together in various ways. A bus may carry signals from

Buses Multipoint bus (or multidrop or broadcast bus) – is used to connect several

Single Bus Problems Lots of devices on one bus leads to: • Propagation delays

Buses Backplane/system bus/ external bus (example of broadcast bus) - When the bus is

Buses The buses connecting various parts of the CPU are actually within the CPU

Buses The external CPU bus – backplane: • Peripheral control interface (PCI) bus -

Typical PC interconnections Prepared by: Dr Masri Ayob 55

Bus Types Dedicated • Separate data & address lines Multiplexed • • Shared lines

Bus Arbitration More than one module controlling the bus e. g. CPU and DMA

Centralised or Distributed Arbitration Centralised • Single hardware device controlling bus access • Bus

Timing Co-ordination of events on bus Synchronous • • • Events determined by clock

PCI Bus Peripheral Component Interconnection Intel released to public domain 32 or 64 bit

PCI Bus Lines (required) Systems lines • Including clock and reset Address & Data

PCI Bus Lines (Optional) Interrupt lines • Not shared Cache support 64 -bit Bus

PCI Commands Transaction between initiator (master) and target Master claims bus Determine type of

Thank you Q&A Prepared by: Dr Masri Ayob 64

Slides: 64

Download presentation

TK 6123: COMPUTER ORGANISATION & ARCHITECTURE Lecture 7: CPU and Memory (2) Prepared By: Associate Prof. Dr Masri Ayob 1

Contents This lecture will discuss: • Memory Unit. • Instruction Execution. • Buses. Prepared by: Dr Masri Ayob 2

Memory Implementations The most common types of memory: • • magnetic core memory static RAM dynamic RAM, and ROM. Memory can be volatile or nonvolatile. • Nonvolatile memory retains its values when power is removed. • Volatile memory loses its contents when power is removed. Prepared by: Dr Masri Ayob 3

Memory Implementations Magnetic core memory uses a small core of magnetic material to hold a bit of data. • Since magnetism remains after the current is removed, core memory is nonvolatile. • Magnetic core memory is expensive and slow in operation compared to other types of memory. It has been replaced almost entirely by RAM. • It is still used on a few computers where both read and write capability are required and where the loss of data or programs would be severely damaging, particularly for military and space applications. Prepared by: Dr Masri Ayob 4

Memory Implementations Most current computers use either static or dynamic RAM for memory. Dynamic RAM (DRAM) is less expensive, requires less electrical power, and can be made smaller, with more bits of storage in a single integrated circuit. • DRAM also requires extra electronic circuitry that “refreshes” memory periodically; otherwise the data fades away after awhile, and is lost. Prepared by: Dr Masri Ayob 5

Memory Implementations Static RAM (SRAM) does not require refreshing. Static RAM is also faster to access than DRAM and is therefore useful in very-highspeed computers and for small amounts of high-speed memory. But SRAM is more expensive and requires more chips. Both dynamic and static RAM are volatile. Currently, the DRAM is the most popular. Prepared by: Dr Masri Ayob 6

Memory Implementations ROM (read-only memory) is used for situations where the software is built permanently into the computer. Early ROM memory was made up of integrated circuits with fuses in them that could be blown. Modern ROM memories use a different technology, which can be erased and rewritten. Prepared by: Dr Masri Ayob 7

Memory Implementations Within the computer, ROM is both nonvolatile and unwriteable. The method used to access memory is basically the same, regardless of memory type. Prepared by: Dr Masri Ayob 8

Memory Implementations EEPROM and Flash ROM are recent memory innovations that implement nonvolatile, writeable memory. • Both allow rewriting by erasing memory cells selectively, then writing new data into those cells. • Flash ROM is faster and more flexible than EEPROM because it can erase and write data in blocks, rather than one byte at a time. • Flash ROM is used in the computer BIOS and in devices, such as digital cameras, that require faster access than a disk can offer. Prepared by: Dr Masri Ayob 9

Primary Memory: Memory Addresses (1) Three ways of organizing a 96 -bit memory. Prepared by: Dr Masri Ayob 10

Instruction Cycle Two steps: • Fetch • Execute Prepared by: Dr Masri Ayob 11

Fetch/Execute Cycle Program Counter (PC) holds address of next instruction to fetch. PC MAR Processor fetches instruction from memory location pointed to by PC. Increment PC • Unless told otherwise Prepared by: Dr Masri Ayob 12

Fetch/Execute Cycle Instruction loaded into Instruction Register (IR). MDR IR. Processor interprets instruction and performs required actions. • If instructions uses word in memory, fetch the word into CPU register. Execute the instruction. Prepared by: Dr Masri Ayob 13

Execute Cycle Processor-memory • data transfer between CPU and main memory Processor I/O • Data transfer between CPU and I/O module Data processing • Some arithmetic or logical operation on data Control • Alteration of sequence of operations. • e. g. jump Combination of above Prepared by: Dr Masri Ayob 14

Design Principles for Modern Computers All instructions directly executed by hardware. Instructions should be easy to decode Only loads, stores should reference memory. Provide plenty of registers. Prepared by: Dr Masri Ayob 15

Design Principles for Modern Computers Maximise rate at which instructions are issued: • Two separate fetch-execute cycle: • fetch unit to retrieve and decode instructions; • And execution unit to perform the actual instruction operation. This allows independent, concurrent operation of the two parts of the fetch-execute cycle. • Pipelining to allow overlapping between the fetchexecute cycles of sequences of instructions. • Separate execution units for different types of instructions. Prepared by: Dr Masri Ayob 16

SEPARATE FETCH UNIT/EXECUTE UNIT To achieve maximum performance, these two parts operate as independently from each other as possible; • but an instruction must be fetched before it can be decoded and executed. several instructions are fetched concurrently from memory by the fetch unit, based on the current address stored in an instruction pointer (IP i. e. PC) register. Prepared by: Dr Masri Ayob 17

SEPARATE FETCH UNIT/EXECUTE UNIT Once an instruction is fetched, it is held in a buffer until it can be decoded and executed. The number of instructions held will depend upon the size of each instruction, the width of the memory bus, and the size of the buffer. Prepared by: Dr Masri Ayob 18

SEPARATE FETCH UNIT/EXECUTE UNIT As instructions are executed, the fetch unit takes advantage of time when the bus is not otherwise being used and attempts to keep the buffer filled with instructions. In general, modern memory buses are wide enough and fast enough that they do not limit instruction retrieval. Prepared by: Dr Masri Ayob 19

SEPARATE FETCH UNIT/EXECUTE UNIT The execution unit contains the ALU and the portion of the control unit that identifies and controls the steps that comprise the execution part for each different instruction. When the execution unit is ready for an instruction, the instruction decoder passes the new instruction to the control unit for execution. Prepared by: Dr Masri Ayob 20

Pipelining Observe that the limitation to performance results from the serial nature of CPU processing: • each instruction requires a sequence of fetchexecute cycle steps, and • the program requires the execution of a sequence of these instructions. • Thus, the keys to increased performance must rely on methods that reduce the time required for each step in the fetchexecute cycle. Prepared by: Dr Masri Ayob 21

Pipelining Most instructions require many steps (clock cycles) to fetch/execute the instruction. Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independently. To speed up processing several independent instructions can be overlapped, so that several instructions are being worked on at a time – pipelining. Prepared by: Dr Masri Ayob 22

Instruction-Level Parallelism A five-stage pipeline The state of each stage as a function of time. Nine clock cycles are illustrated Prepared by: Dr Masri Ayob - TK 2123 23

Pipelining Problem: a branch instruction may invalidate all the instructions in the pipeline at that instant if the branch is taken, and the computer still must have the data from the previous instruction if the next instruction requires it in order to proceed. Modern computers use a variety of techniques to compensate for the branching problem. • One common approach is to maintain two or more separate pipelines so that instructions from both possible outcomes can be processed until the direction of the branch is clear. • Another approach attempts to predict the probable Prepared by: Dr Masri Ayob 24

Superpipelined Many pipeline stages need less than half a clock cycle. Double internal clock speed gets two tasks per external clock cycle. Superscalar allows parallel fetch execute. Prepared by: Dr Masri Ayob 25

Scalar and Superscalar Processor Organisation It is not useful to pipe different types of instructions through a single pipeline. With a single execution unit pipeline (ignoring problems with different instruction types and branch conditions), • the CPU can average instruction execution approximately equal to the clock speed of the machine. • processor fulfilling this condition is called a scalar processor. Prepared by: Dr Masri Ayob 26

Scalar and Superscalar Processor Organisation With multiple execution units, it is possible to process instructions in parallel, with an average rate of more than one instruction per clock cycle. • The ability to process more than one instruction per clock cycle is known as superscalar processing. Pipelining and superscalar processing techniques do not affect the cycle time of any individual instruction. Prepared by: Dr Masri Ayob 27

Superscalar Architectures (1) Dual five-stage pipelines with a common instruction fetch unit. Prepared by: Dr Masri Ayob - TK 2123 28

Superscalar Architectures (2) A superscalar processor with five functional units. Prepared by: Dr Masri Ayob - TK 2123 29

Limitations Technical issues that must be resolved to make it possible to execute multiple instructions simultaneously: • True data dependency : Problems that arise from instructions completing in the wrong order. • Procedural dependency : Changes in program flow due to branch instructions. • Resource conflicts: Conflicts for internal CPU resources, particularly general-purpose registers. • Output dependency • Antidependency Prepared by: Dr Masri Ayob 30

True Data Dependency ADD r 1, r 2 (r 1 : = r 1+r 2; ) MOVE r 3, r 1 (r 3 : = r 1; ) Can fetch and decode second instruction in parallel with first Can NOT execute second instruction until first is finished Prepared by: Dr Masri Ayob 31

Procedural Dependency Conditional branch instructions may depend on the results from instructions that have not yet been executed. • These situations are known as flow or branch dependencies. • If the wrong branch is in the pipeline, the pipeline must be flushed and refilled, wasting time. • Worse yet, an instruction from the wrong branch, that is, one that should not have been executed, can alter a previous result that is still needed. Prepared by: Dr Masri Ayob 32

Resource Conflict Two or more instructions requiring access to the same resource at the same time • e. g. two arithmetic instructions Can duplicate resources • e. g. have two arithmetic units Prepared by: Dr Masri Ayob 33

In-Order Issue Out-of-Order Completion Output dependency • • • R 3: = R 3 + R 5; (I 1) R 4: = R 3 + 1; (I 2) R 3: = R 5 + 1; (I 3) I 2 depends on result of I 1 - data dependency If I 3 completes before I 1, the result from I 1 will be wrong - output (read-write) dependency. Prepared by: Dr Masri Ayob 34

Antidependency Write-write dependency • • • R 3: =R 3 + R 5; (I 1) R 4: =R 3 + 1; (I 2) R 3: =R 5 + 1; (I 3) R 7: =R 3 + R 4; (I 4) I 3 can not complete before I 2 starts as I 2 needs a value in R 3 and I 3 changes R 3 Prepared by: Dr Masri Ayob 35

Design Issues Instruction level parallelism • Instructions in a sequence are independent • Execution can be overlapped • Governed by data and procedural dependency Machine Parallelism • Ability to take advantage of instruction level parallelism • Governed by number of parallel pipelines Prepared by: Dr Masri Ayob 36

Instruction Issue Policy Order in which instructions are fetched. Order in which instructions are executed. Order in which instructions change registers and memory. Prepared by: Dr Masri Ayob 37

Processor-Level Parallelism (1) An array of processor of the ILLIAC IV type. Prepared by: Dr Masri Ayob - TK 2123 38

Processor-Level Parallelism (2) A single-bus multiprocessor. A multicomputer with local memories. Prepared by: Dr Masri Ayob 39

Buses There a number of possible interconnection systems Single and multiple BUS structures are most common e. g. Control/Address/Data bus (PC) e. g. Unibus (DEC-PDP) Prepared by: Dr Masri Ayob 40

What is a Bus? A communication pathway connecting two or more devices. Is a physical connection for transferring data from one location in the computer system to another. Definition: A group of electrical conductors suitable for carrying computer signals from one location to another. • Each conductor in the bus is commonly known as a line • Each line carries a single electrical signal - might represent one bit of a memory address, or a sequence of data bits. 41 Prepared by: Dr Masri Ayob

Data Bus Carries data • Remember that there is no difference between “data” and “instruction” at this level. Width is a key determinant of performance • 8, 16, 32, 64 bit Prepared by: Dr Masri Ayob 42

Address bus Identify the source or destination of data e. g. CPU needs to read an instruction (data) from a given location in memory Bus width determines maximum memory capacity of system • e. g. 8080 has 16 bit address bus giving 64 k address space Prepared by: Dr Masri Ayob 43

Control Bus Control and timing information • provide control for the proper synchronisation and operation of the bus and of the modules that are connected to the bus: • Memory read/write signal • Interrupt request • Bus request • Clock signals • Etc. Prepared by: Dr Masri Ayob 44

Bus Interconnection Scheme Prepared by: Dr Masri Ayob 45

Big and Yellow? What do buses look like? • Parallel lines on circuit boards • Ribbon cables • Strip connectors on mother boards • e. g. PCI • Sets of wires Prepared by: Dr Masri Ayob 46

Physical Realisation of Bus Architecture Prepared by: Dr Masri Ayob 47

Buses may connect modules together in various ways. A bus may carry signals from a specific source to a specific destination - point-topoint bus. • E. g. The cable that connects the parallel or serial port in a personal computer from the computer to a printer. • Point-to-point buses intended for connection to a plug-in device are often called ports. Prepared by: Dr Masri Ayob 48

Buses Multipoint bus (or multidrop or broadcast bus) – is used to connect several points together, where signals produced by a source on the bus are “broadcast” to every other point on the bus. • E. g. Ethernet network. • In most cases, a multipoint bus requires addressing signals on the bus to identify the desired destination that is being addressed by the source at a particular time. Prepared by: Dr Masri Ayob 49

Buses Prepared by: Dr Masri Ayob 50

Single Bus Problems Lots of devices on one bus leads to: • Propagation delays • Long data paths mean that co-ordination of bus use can adversely affect performance • If aggregate data transfer approaches bus capacity Most systems use multiple buses to overcome these problems Prepared by: Dr Masri Ayob 51

Buses Backplane/system bus/ external bus (example of broadcast bus) - When the bus is used to carry computer signals that connect the CPU with memory and/or with a set of plug-in I/O module cards in the same physical package. Different buses might be used for connecting the different parts of the system. The interfaces between different buses are called bus interface bridges - make it possible for different buses to communicate with each other. Prepared by: Dr Masri Ayob 52

Buses The buses connecting various parts of the CPU are actually within the CPU chip. Bus protocol - is an agreement between two or more entities that establishes a clear, common path of communication and understanding between them. Prepared by: Dr Masri Ayob 53

Buses The external CPU bus – backplane: • Peripheral control interface (PCI) bus - a popular modern external bus, which used in Sun workstations, Apple Macintosh computers, Intel PCs, and Hewlett-Packard Alpha. Servers. • This means that the same peripheral I/O cards may be plugged into many different computers. • AGP (accelerated graphics processor) bus. • ISA (industry standard architecture) bus - was the standard system bus for Intel PCs for many years, but is rapidly becoming extinct in favor of the faster and more flexible PCI bus for general I/O interface use. Prepared by: Dr Masri Ayob 54

Typical PC interconnections Prepared by: Dr Masri Ayob 55

Bus Types Dedicated • Separate data & address lines Multiplexed • • Shared lines Address valid or data valid control line Advantage - fewer lines Disadvantages • More complex control • Ultimate performance Prepared by: Dr Masri Ayob 56

Bus Arbitration More than one module controlling the bus e. g. CPU and DMA controller Only one module may control bus at one time Arbitration may be centralised or distributed Prepared by: Dr Masri Ayob 57

Centralised or Distributed Arbitration Centralised • Single hardware device controlling bus access • Bus Controller • Arbiter • May be part of CPU or separate Distributed • Each module may claim the bus • Control logic on all modules Prepared by: Dr Masri Ayob 58

Timing Co-ordination of events on bus Synchronous • • • Events determined by clock signals Control Bus includes clock line A single 1 -0 is a bus cycle All devices can read clock line Usually a single cycle for an event Prepared by: Dr Masri Ayob 59

PCI Bus Peripheral Component Interconnection Intel released to public domain 32 or 64 bit 50 lines Prepared by: Dr Masri Ayob 60

PCI Bus Lines (required) Systems lines • Including clock and reset Address & Data • 32 time mux lines for address/data • Interrupt & validate lines Interface Control Arbitration • Not shared • Direct connection to PCI bus arbiter Error lines Prepared by: Dr Masri Ayob 61

PCI Bus Lines (Optional) Interrupt lines • Not shared Cache support 64 -bit Bus Extension • Additional 32 lines • Time multiplexed • 2 lines to enable devices to agree to use 64 -bit transfer JTAG/Boundary Scan • For testing procedures Prepared by: Dr Masri Ayob 62

PCI Commands Transaction between initiator (master) and target Master claims bus Determine type of transaction • e. g. I/O read/write Address phase One or more data phases Prepared by: Dr Masri Ayob 63

Thank you Q&A Prepared by: Dr Masri Ayob 64