EE 3121 Computer System Engineering Academic Year 20112012

Lab. Schedule Commence in week 5 A briefing will be provided before the session

Reference Textbook The Intel Microprocess …, Prentice-hall, Barry Brey. ISBN 0 -13 -060714 -2

Reference books IBM PC Assembly Language and Programming - Fourth Edition (Peter Abel -

Teaching materials Available in Web. CT web site Also some useful information can be

Related Topics Computer Architectures of microprocessors (2. 5~3 weeks) Assembly Language programming (3 weeks)

Assessment methods Continuous assessment (40%) Performance in Lab and online exercises (10%) n Lab

Criteria Reference Assessment Pass (D/D+) n n Some understanding of assembly language programming Some

Objectives To understand the basic principles (hardware components) of a computer system To learn

Learning outcomes 1. Given specifications of an application and the instruction set of the

Attention!!!!!!! This is not a traditional “Engineering” subject!!!! Equations – almost none!!!!!! Try to

What is a computer ? ? ? Nowadays, computer comes in different forms!!!!!!!!!! In

What is a computer? Is the structure of a tablet very different from a

The PC is dead “It was fun while it lasted, but as of 2011

Different forms of computer Can you think of a real-life example that involves the

Computer is a digital system Computer is a combination of digital and analogue systems,

Number systems (revision) The microprocessor (µP) is a binary device, everything inside the microprocessor

Number system In base 10 n 123 = 1 x 102 + 2 x

Number system In addition, we use the term byte to represent an 8 -bit

Hex number system In octal, only digits from 0 to 7 will be used

Unsigned and signed numbers If a binary pattern represents both positive as well as

2’s complement Usually the 2’s complement system is used. To convert a value from

Floating point representation As mentioned above, each digit in a binary pattern represents a

Floating-point format Floating-point numbers are represented in the form X = ±F • 2

$IEEE Floating-point standard For single precision n There are 23 bits for fraction and$

Floating-point arithmetic Using the format ±F • 2 ±E can implement floating point arithmetic

Floating-point format For single precision format If E=255, F 0; Na. N (not a

Example of floating point Given 6. 125 express the value in binary format usingle

Exercise Convert the number -4. 5 into a IEEE 32 -bit floating-point pattern Convert

Characters In addition to values, characters are also represented using binary codes usually in

Chinese characters and other non-ASCII characters are represented by two bytes There are different

Big 5 code Every Chinese Character is represented by a two byte code. The

Some examples Address Character A 640 H 共 AF 66 H 病 A 741

Revision exercises What is a Bit, a byte, a word, one K, one M

More exercises How to present negative values in binary number system? How about using

Microprocessor/Microcomputer What is a Microcomputer A complete computer based on a particular microprocessor chip.

Block diagram of a generic microcomputer system RAM Hard Disk CD ROM Data and

Microprocessor based system such as an electronic mouse sensor Input u. P with Control

Model Year Max. Clock Transistor Register frequency at s per Die Sizes introduction Ext.

Features of microprocessor Operating frequency Size of register and how many? Size of data

Different types of microprocessor Microprocessors can also be classified based on the instructions. There

CISC An alternative to RISC is to make use of more complex instructions which

The Intel 8086 Microprocessor The 8086 is a popular device used in the early

8086 Microprocessor This is a 16 -bit microprocessor chip manufactured by high-performance metaloxide semiconductor

Self test Do you know what does it mean by 16 -bit, 32 -bit,

Basic 8086 features True 16 -bit microprocessor with 16 -bit internal and external data

Pins layout for 8086 A/D – address/data (address and data share the pins -

8086 Features The 8086 has two modes – min. and max. Min. mode –

Block diagram for a simple computer system Display unit LCD What are the basic

What are the basic operations performed by a microprocessor? Get instruction from memory Perform/Execute

The 8086 Internal Architecture The internal functions of the 8086 µP are divided between

Terminology Program is stored in memory and consists of a sequence of instructions and

Bus Interfacing Unit (BIU) The BIU is the 8086’s interface to the outside world

BIU There is a full 16 -bit bidirectional data bus and 20 -bit address

How BIU and EU collaborate What a program consists of ? ? ? A

BIU – EU Pipeline mechanism BIU and EU implements a pipeline (BIU->EU) and pre-fetch

Pre-fetch concept Pre-fetching is similar to what you do when you’re having a buffet

Pre-fetch by BIU What’s pre-fetch? ? When the queue can store at least 2

Pre-fetch Int 1 a Int 1 b Int 2 a Int 2 b Int

Instruction sequence Time Fetch Time Execute Fetch Execute

Pre-fetching by BIU If the instruction queue is full (or there is only 1

Exercise An instruction can be completed by a single processing stage and it takes

Answer Each sub-unit can only process one instruction at each time 45+(x-1)*12 <= 25*x

Pre-fetch Is pre-fetching in a buffet dinner exactly the same as the pre-fetching mechanism

Components in the BIU is to read/write the memory What is needed to access

Inside BIU Example: the physical address of the next instruction to be fetched is

Generating the physical address If CS (code segment) is 1005 H The IP is

Segment concept 8086 can support up to 1 M memory Memory is divided into

Segment in 8086 Why segment mechanism is needed in 8086? n The address bus

Segment concept Using numbers in base 10, with only 1 digit then the max.

Segment concept Segment analogy It is similar to a hotel If you stay in

Segment concept A 64 k segment 1 M How can we access locations within

Segment Registers The segment registers are used for accessing the memory The 8086 address

The Segment concept Real address Offset So the real address (physical address) is =

Segment concept For example: FFFFEH is not divisible by 16 n FFFF 0 H

Segment concept The maximum value of a 16 -bit value is FFFF (Hex), if

Segment concept The segment concept analogy n If you are design the elevator for

Execution Unit (EU) The EU is responsible for decoding and executing all instructions. What

Decoding Instruction 10101100 Decoder Control signals

Decoding Control signals are generated for each execution step based on the instruction. These

Decoding Microprogrammed control consists of a microinstruction address generator, which generates the address to

Decoding Microprogrammed control can be viewed as having a control processor within the main

Execution Unit (EU) (Cont’d) EU consists of an ALU (Arithmetic and Logic Unit), status

Functions of EU ADD AX, 16 ; meaning add 16 to AX Where AX

Functions of EU If the instruction queue is empty, the EU waits for the

Summary What is the pre-fetch concept? What is a pipeline and its advantage? What

8086 Internal Registers are a very important component because they are used as a

Instruction Pointer (IP) Identifies the location of the next instruction to be executed in

Data Registers 4 general purpose data registers and are used for temporary storage of

Data Registers The general purpose data registers can be used for arithmetic or logic

Segment Registers (Cont’d) Stack Segment (SS) Register n SS register contains a logical address

Pointer and Index Registers Stack Pointer (SP) – permits easy access to locations in

Pointer and Index Registers Index register are used to hold offset addresses for instructions

Registers and pointers Segment register Pointer CS (code segment) IP (instruction pointer) DS (data

Flag Register The flag register is a 16 -bit register within the execution unit.

8086 Flag Register (status flag) C - Carry Bit (set if there is a

Example If our data is only 8 -bit then we do FFH + 1

Flags consider using 8 -bit values A and B, determine flag status for C,

Block diagram for a simple computer system Display unit LCD memory CPU Get instruction

Bus Cycle Bus – address and data Bus cycle is used to access memory,

Bus cycle T 1 : BIU puts an address on the bus T 2:

Wait states can be inserted into a bus cycle In response to request by

Example Assembly Language Machine Code MOV AX, 0 F 802 H PUSH AX MOV

Pins definition AD 15 – AD 0 Address/Data Bus A 19/S 6 – A

Pins definition HOLD – hold request (used with DMA) HLDA – hold acknowledge (entered

Pins definition RQ / GT 1, 0 – request / grant bus access control

Status lines – reflects activities of the BIU /S 2 0 0 1 1

Instruction Queue status QS 1 QS 0 Indication 0 0 No operation 0 1

Instruction sequence 100 B 8 02 10 02 F 8 50

Status signals S 6 – S 3 – are output on the bus at

Control signals are provided to support memory and I/O interfaces ALE – 0 ->

Operating modes of 8086 The 8086 can run in two different modes: minimum and

Maximum mode The basic functions of the bus controller chip (8288) is to derive

Minimum mode circuit Latch – store Transceiver – transmit + receive

Read cycle Why there are two address? Output From address latch Output from multiplexed

Memory To access the memory both address and data must be available at the

Modern microprocessor architecture Based on your understanding of the 8086 microprocessor, what features of

Modern microprocessors The following features are crucial to the performance: Operating speed (clock rate)

Modern microprocessor Operating speed n The 8086 operates at 5 MHz n The latest

Modern microprocessor Memory n n n The 8086 has 20 -bit address bus and

Memory speed To improve the performance, internal high-speed memory is provided for the storage

Cache n Accessing data/instruction from cache is faster that accessing the traditional memory so

Cache is fast Access time for DRAM could be 60 ns while for the

Why Cache? The effectiveness of cache is based on a property of computer program

Cache Therefore, many instructions in localized areas of the program are executed repeatedly during

Cache Whenever an information item (data or instruction) is first needed, this item should

Cache The cache is usually divided into smaller units, called blocks, lines or slots.

Operation of cache The processor does not need to know the existence of the

Cache write In a write-through, both the cache and the main memory are updated.

Cache structure In order to identify if data from main memory is stored in

Structure of cache If the tag is 091 AH then that means the memory

Cache structure The V bit is the valid bit indicating whether the slot holds

Mapping between the cache and external memory Size of the cache (KB) is much

Mapping between the cache and external memory Based on the number of blocks, the

Example of 4 way set associative If the CPU has a 32 -bit address

Example If address issued is 12345678 H (32 -bit) If the cache slot is

Replacement algorithms When a new block is to be brought into the cache and

Modern Microprocessor Data The 8086 has a 16 -bit data bus n The P

Overlapping of operations In 8086 the instruction pipeline enables the overlapping of execution of

Overlapping of operations So it is possible to execute three instructions simultaneously (with proper

Pentium Processor Pentium processor issues 2 instructions in parallel to the 2 independent integer

Block diagram of Pentium processor Can you see the pipeline in the above?

Block diagram of U-V pipe for Pentium processor The pipeline of 8086 has how

The 8086 instruction execution sequence Fetch instruction Decode and determine operand address Fetch operand

Branch prediction As in 8086, when a “jump” operation is being performed then the

Branch prediction The cache controller of modern CPUs analyze the memory block it loaded

Branch prediction The cache controller loads both conditions into the memory cache. Later, when

Dynamic branch prediction The processor hardware assesses the likelihood of a given branch taken

Dynamic branch prediction Suppose it starts at LNT, when the branch instruction is executed

Dynamic branch prediction Once a loop is entered, the decision for the branch instruction

Dynamic branch prediction BT BNT LT BNT BT Consider a Do { }While (condition)

Enhanced dynamic branch prediction BT SNT BNT LNT BNT ST – strongly likely to

Modern microprocessor Floating point hardware Do you know how floating-point number is represented in

Perform more tasks in a single cycle MMX – multimedia extension It is for

MMX /SSE mechanism d 1 d 2 d 3 d 4 Operation d 5

Points about P 4 More features to support parallel processing – hyper-threading technology, SSE,

Self-test How to make a microprocessor to become more powerful! What is the most

Block diagram of Intel P 4 CPU Can you spot any features come from

Introduction to microcontroller If you want to develop a simple robot are you going

Microcontroller Some microcontrollers also come with additional features such as ADC (analog to digital

The ADu. C 832 device The ADu. C 832 is a powerful microcontroller and

Application using microcontroller The components can be connected directly to the microcontroller via the

Slides: 180

Download presentation

EE 3121 Computer System Engineering Academic Year 2011/2012 BENG Subject Code – EE 3121 Lecturer – Y. F. Fung Office (Y. F. Fung) : CF 605 email: eeyffung@polyu. edu. hk Consultation: Friday afternoon

Lab. Schedule Commence in week 5 A briefing will be provided before the session Form your own group and with 2 students per group

Reference Textbook The Intel Microprocess …, Prentice-hall, Barry Brey. ISBN 0 -13 -060714 -2 Computer Organization and Embedded Systems, C. Hamacher, Z. Vranesic et al, Mc. Graw Hill, 2012 Intel Microprocessors: Architecture, Programming and Interfacing, Ray & Bhurchandi, Mc. Graw. Hill, ISBN 0 -07 -120169 -6 Pentium Processor System Architecture-2 nd Editions, Don Anderson/Tom Shanley-Mindshare, Inc. n Latest edition for the above titles

Reference books IBM PC Assembly Language and Programming - Fourth Edition (Peter Abel - Prentice-Hall International Inc. ) The 8086 and 80286 Microprocessors Hardware, Software and Interfacing (A. Singh, W. A. Triebel, Prentice-Hall)

Teaching materials Available in Web. CT web site Also some useful information can be found in ftp. ee. polyu. edu. hk/yffung/ee 3231

Related Topics Computer Architectures of microprocessors (2. 5~3 weeks) Assembly Language programming (3 weeks) Memory system overview (2 weeks) Input/Output mechanism (2 weeks) Other computing supporting facilities: interrupt (1. 5 weeks), communication (1. 5 weeks)

Assessment methods Continuous assessment (40%) Performance in Lab and online exercises (10%) n Lab report (10%) n Quiz (usually only 1 test or quiz) (usually around week 8 or 9 after the first 3 topics) (20%) n Examine (60%)

Criteria Reference Assessment Pass (D/D+) n n Some understanding of assembly language programming Some understanding of other topics Grade C/C+ n n Able to write simple assembly language programs Able to demonstrate in-depth understanding of certain topics Grade B/B+ n n Able to write assembly language program to achieve specific tasks Able to demonstrate in-depth understanding for most topics Grade A/A+ n n n Able to write assembly language program Able to demonstrate in-depth understanding of all topics Able to apply knowledge learnt to solve real-life problems

Objectives To understand the basic principles (hardware components) of a computer system To learn how to control a computer system using assembly language programming To learn how to design a simple computer system (eg for the development of an electronic mouse, robot, simple control system for your project)

Learning outcomes 1. Given specifications of an application and the instruction set of the microprocessor, design an assembly program to carry out the necessary operations 2. able to appreciate advanced features of the latest microprocessors 3. given a set of conditions, design a simple computer system 4. able to think logically and be able to present results

Attention!!!!!!! This is not a traditional “Engineering” subject!!!! Equations – almost none!!!!!! Try to think, understand learn Do not try to memorize

What is a computer ? ? ? Nowadays, computer comes in different forms!!!!!!!!!! In this subject, we try to learn the generic form of a computer!!!!!!!! If you are willing to spend your time, it is possible to build your own game console or a cell phone

What is a computer? Is the structure of a tablet very different from a traditional PC?

The PC is dead “It was fun while it lasted, but as of 2011 the era of the desktop computer is officially over. ” Read more: http: //www. foxnews. com/scitech/201 1/01/11/pc-is-dead-ces-tabletssmartphone/#ixzz 1 X 1 j. Bu. Y 8 N

Different forms of computer Can you think of a real-life example that involves the application of a microprocessor or a computer?

Computer is a digital system Computer is a combination of digital and analogue systems, but mainly digital So to learn this subject, you also need to understand basic digital system as well as binary number system

Number systems (revision) The microprocessor (µP) is a binary device, everything inside the microprocessor is represented by 0 and 1. The most direct number system used inside the microprocessor is the binary system (base 2) with only 0 and 1, for example 0101. Each digit in the number represents a value in power of 2, starting from the RHS (Right hand side). The first digit is 20, then 21, 22 , etc. Usually, the most RHS bit is also called the LSB (least significant bit) while the most LHS bit is the MSB (most significant bit). So the value 0101 = 0 x 27 + 1 x 26 + 0 x 25 + 1 x 24 + 0 x 23 + 1 x 22 + 0 x 21 + 1 x 20 = 64+16+4+1 = 85

Number system In base 10 n 123 = 1 x 102 + 2 x 101 + 3 x 100 In subtraction 0 -1 = 11 because there is no ( -) sign in binary system. The first “ 1” is called the borrow bit In addition 1+1 = 10 the first “ 1” is called the carry bit

Number system In addition, we use the term byte to represent an 8 -bit data and the term word to represent 16 -bit value, for a 32 -bit value, it is called double word. If we want to represent a very large number then we need to use many bits and this is not very convenient. Therefore we usually use number systems derived from the binary system and the most commonly used number system are Octal (base 8) and hexademical (base 16) usually we just use the term Hex.

Hex number system In octal, only digits from 0 to 7 will be used and in hexadecimal, we use 0 to 9 and A, B, C, D, E, F to represent the values. A = 10, B = 11, C = 12, D = 13, E = 14, F = 15 To convert, from a binary number to a hex, it can be done very easily, starting from the RHS, every 4 -bit from the binary number can be converted directly into a hex digit. For example: 0101 is equal to 55 H (H == hexadecimal), we have two 4 -bit groups 0101 and 0101. The value 0101 = 5 so the binary pattern is 55 H (in Hex). Example: 1100 0111 1010 1011 = C 7 A B H To convert back ABCDH = 1010 1011 1100 1101

Unsigned and signed numbers If a binary pattern represents both positive as well as negative values then it is regarded as signed, otherwise, it is unsigned (only positive). There are two types of signed notations, using a sign bit (usually the most LHS bit) and 2’s complement. For example using the LHS bit as a sign-bit, (0 is positive; 1 – negative ) 1010 = -42 while 00101010 = 42

2’s complement Usually the 2’s complement system is used. To convert a value from X to –X using 2’s complement, we first do a 1’s complement of X and then add 1 to the result. The 1’s complement of a binary value is to invert its 0’s and 1’s. Example: X = 00101010 = 42 first convert X using 1’s complement gives 11010101 then add 1 to the result, i. e. 11010101 + 1 = 11010110 = -42 (comparing with pattern using sign-bit 1010 )

Floating point representation As mentioned above, each digit in a binary pattern represents a value in the power of 2. In a floating point format, the digits after the decimal point represent value 2 -1, 2 -2, etc. Example, 0. 1010 = 1 x 2 -1+0 x 2 -2+1 x 2 -3+0 x 2 -4 = 0. 5+0. 125 = 0. 625 How to represent signed floating point values?

Floating-point format Floating-point numbers are represented in the form X = ±F • 2 ±E F is called the fraction (or mantissa) and E is the exponent Floating-point number is usually represented in two standard (defined by IEEE ) 32 -bit (single precision) and 64 -bit (double precision). A 80 -bit (extended precision) standard is also available.

$IEEE Floating-point standard For single precision n There are 23 bits for fraction and$

IEEE Floating-point standard For single precision n There are 23 bits for fraction and 8 bits exponent and 1 sign bit For double precision n There are 52 bits fraction, 11 bits exponent and 1 sign bit The bit formation is sign Exp. Fraction

Floating-point arithmetic Using the format ±F • 2 ±E can implement floating point arithmetic very easily X = a x 2 b Y = c x 2 d X*Y = (axc)* 2 (b+d) X/Y = (a/c )* 2 (b-d) How about X+Y and X-Y ? ? ?

Floating-point format For single precision format If E=255, F 0; Na. N (not a number) If E=255, F=0, S=1; then –infinity If E=255, F=0, S=0; then +infinity If 0<E<255 , X = (-1)S*2* (E-127) *(1. F) If E=0, F 0, X = (-1)S*2* (-126) *(0. F) If E=0, F=0, S=1 then X = -0 If E=0, F=0, S=0 then X = +0

Example of floating point Given 6. 125 express the value in binary format usingle precision floating-point format n n n First convert the value into the format 1. XXX*2 y 6. 125 = 1. 53125 x 22 So the exponent is 2 and the fraction is 1. 53125 As all floating-point values always come with the ‘ 1. ’ so the ‘ 1. ’ is not stored and only 53125 is saved!!!! 0. 53125 =. 10001 0 10000001 100000000… E-127 = 2 => E=129

Exercise Convert the number -4. 5 into a IEEE 32 -bit floating-point pattern Convert 9 into a IEEE 32 -bit floating point Without using a calculator, determine 01010010 + 10101101 Without using a calculator, determine 01010010 – 10101101

Characters In addition to values, characters are also represented using binary codes usually in ASCII (American Standard Code for Information Interchange) code.

ASCII table A = 41 H ; a = 61 H so a>A

Chinese characters and other non-ASCII characters are represented by two bytes There are different coding systems to represent Chinese characters such as Unicode, Big 5 etc. A font file must be installed in order to display the corresponding characters

Big 5 code Every Chinese Character is represented by a two byte code. The first byte ranges from 0 x. A 1 to 0 x. F 9, while the second byte ranges from 0 x 40 to 0 x 7 E, 0 x. A 1 to 0 x. FE Since the MSB of the two byte code is always set. Thus, in a document that contain Chinese characters and regular ASCII characters, the ASCII characters are still represented with a single byte.

Some examples Address Character A 640 H 共 AF 66 H 病 A 741 H 你 AA 46 H 東

Revision exercises What is a Bit, a byte, a word, one K, one M What is Hex, Oct, binary Do you know how to convert a number into different base systems without using a calculator? Binary number notation (1010 = ? ? ? ) How many different values can be represented by a 8 -bit pattern? Hexadecimal A = ? ? ? How to convert binary 110101110101 to Hex? CFH (Hex) = ? ? ?

More exercises How to present negative values in binary number system? How about using binary system to represent a floating number? ? ? What is ASCII? What is BCD? What is a string? ? ? Do you know how to program in C/C++, Fortran, Java, assembly language etc ? ? ? Do you know what is a multiplexer, register, latch? Referring to the block diagram of a memory, do you know what is an address; what is data?

Microprocessor/Microcomputer What is a Microcomputer A complete computer based on a particular microprocessor chip. n So the microprocessor is the most important component in a microcomputer n So to study a microcomputer system, we must first understand the microprocessor What is a Microprocessor n Processor-on-a-chip can be described as a microprocessor. 8051 series, 8086, Pentium series, etc

Block diagram of a generic microcomputer system RAM Hard Disk CD ROM Data and address bus Keyboard, mouse Monitor, printer

Microprocessor based system such as an electronic mouse sensor Input u. P with Control program motor Output

Structure of a modern computer system

Model Year Max. Clock Transistor Register frequency at s per Die Sizes introduction Ext. data bus size Max. Caches external address space 8086 1978 8 MHz 29 K 16 GP 16 1 MB None 486 1989 25 MHz 1. 2 M 32 GP 80 FPU 32 4 GB L 1: 8 KB Pentium 1993 60 MHz 3. 1 M 32 GP 80 FPU 64 4 GB L 1: 16 K B P 3 1999 500 MHz 8. 2 M 32 GP 64 80 FPU 64 MMX 128 XMM 64 GB L 1: 32 K B L 2: 512 KB Pentium Dual Core 2007 1. 6 GHz to 2. 4 GHz 167 M 64 64 GB L 2: 1 MB GP – general purpose FPU – floating point unit Register – a device to store binary data

Features of microprocessor Operating frequency Size of register and how many? Size of data bus Size of address bus Size of cahce Number of core

Different types of microprocessor Microprocessors can also be classified based on the instructions. There are two fundamentally different approaches in the design of instruction set. One popular approach is called Reduced Instruction Set Computers (RISC). In a RISC, each instruction occupies exactly one word. Examples include ARM, PIC. Devices using ARM processor – Blackberry Cortex processor is also from ARM

CISC An alternative to RISC is to make use of more complex instructions which may span more than one word of memory, and which may specify more complicated operations. Processors based on this idea is called Complex Instruction Set Computer (CISC). The Intel x 86 CPUs and AMD processors are examples of CISC

The Intel 8086 Microprocessor The 8086 is a popular device used in the early 70’s and 80’s and its architecture is simple and suitable for teaching computer architecture Once we gain the basic concept of the 8086, we can then discuss the more advanced microprocessors Many features found in 8086 are still being embedded in modern microprocessors but enhanced!

8086 Microprocessor This is a 16 -bit microprocessor chip manufactured by high-performance metaloxide semiconductor (HMOS) technology Circuitry on chip is approximately 29, 000 transistors Comes in a 40 -pin package

Self test Do you know what does it mean by 16 -bit, 32 -bit, or 64 -bit processor? How would you describe an Intel Core 2 Duo CPU ?

Basic 8086 features True 16 -bit microprocessor with 16 -bit internal and external data bus The address bus and data bus are multiplexed? ? ? Multiplex – address and data share the same pin!! A 20 -bit address bus which allows access to 1 MB of memory. Can address up to 64 K byte-wide I/O ports Or 32 K word-wide ports (word = 16 bits) Details regarding I/O ports will be discussed in the I/O System

Pins layout for 8086 A/D – address/data (address and data share the pins - multiplexed) Also pay attention to “active high” and “active low” signals

8086 Features The 8086 has two modes – min. and max. Min. mode – used as a typical microprocessor Max. mode – use with multiple processors, usually for floating-point arithmetic) The mode selection is via the MN/MX input

Block diagram for a simple computer system Display unit LCD What are the basic operations performance by a computer? memory CPU Get instruction from memory Perform/Execute operation Get next instruction I/O

What are the basic operations performed by a microprocessor? Get instruction from memory Perform/Execute operation Get next instruction So inside the microprocessor, it is organized into two units: Bus Interface Unit (BIU) and Execution Unit (EU). So that it can perform the above operations effectively

Processor Model for 8086

The 8086 Internal Architecture The internal functions of the 8086 µP are divided between two separate processing units. They are the Bus Interfacing Unit (BIU) and the Execution Unit (EU). The BIU is responsible for performing all bus operations, such as instruction fetching, reading and writing operands from/to memory, and inputting and outputting of data for peripherals. The EU is responsible for executing instructions The two units operate asynchronously so overlapping instruction fetch and execution is possible (what’s the advantage of this? ? ? )

Terminology Program is stored in memory and consists of a sequence of instructions and some data To execute an instruction it may require some operands What is an operand? Operand is the object that is being operated upon! Example, in an instruction ADD A, B (A = A+B) ADD (addition is the operation) A and B are the operands

Bus Interfacing Unit (BIU) The BIU is the 8086’s interface to the outside world (external memory). The major task of BIU is to get “information” from the memory Information includes data and instructions How can we get data from memory? ? ? To access the memory, we need to issue an address (via the address bus) and then read the data (via the data bus) (Details of this mechanism will be discussed when we discuss the memory systems)

BIU There is a full 16 -bit bidirectional data bus and 20 -bit address bus It has the following functions: instruction fetch, instruction queueing, operand fetch and storage, and bus control. It contains the segment registers, internal communication registers, instruction pointer, instruction object code queue, address summer ( ), and bus control logic.

How BIU and EU collaborate What a program consists of ? ? ? A program is a collection of instructions and data BIU fetch an instruction from memory and put it in the queue and this is called instruction queue (refer to the block diagram) EU fetches the instruction from the queue and executes BIU and EU implements a pipeline (BIU->EU) and pre-fetch to optimize the performance

BIU – EU Pipeline mechanism BIU and EU implements a pipeline (BIU->EU) and pre-fetch to optimize the performance Information coming from memory BIU Control to access the memory EU executes the instruction queue that can store 6 bytes of instructions EU requests BIU to get operands

Buffet - pre-fetch

Pre-fetch concept Pre-fetching is similar to what you do when you’re having a buffet dinner. You collect different kinds of food from the buffet table, for example, you take the sashimi, roast beef, soup, and salad etc. When you’re eating the salad, you have already pre-fetched the sashimi and the roast beef! If you do not pre-fetch then you take the salad first, go back to the table, eat your salad. When you finish the salad then you go and get some other food. Why pre-fetching your food? ? ?

Pre-fetch by BIU What’s pre-fetch? ? When the queue can store at least 2 bytes EU is not requesting BIU to read or write operands from memory BIU will look ahead in the program by prefetching the next sequential instruction The prefetched instructions are held in the queue which is a FIFO (First-in-first-out) device Two bytes are fetched (16 -bit data bus) in a single memory cycle EU will read one instruction byte from the output of the queue

Pre-fetch Int 1 a Int 1 b Int 2 a Int 2 b Int 2 c Int 3 a Int 3 b Int 4 a Memory Queue int 1 b int 1 a int 2 b int 2 a int 1 b EU int 1 a While EU is processing “int 1 a” Int 2 a and int 2 b have already been Pre-fetched

Instruction sequence Time Fetch Time Execute Fetch Execute

Pre-fetching by BIU If the instruction queue is full (or there is only 1 empty slot) and EU is not requesting access to operands in memory, the BIU does not perform any bus cycles – this is called idle states When BIU is in the process of fetching an instruction when the EU requests its services then BIU first completes the instruction fetch bus cycle and then serves the EU

Exercise An instruction can be completed by a single processing stage and it takes 25 clock cycles to finish. On the other hand, the instruction can be divided into five sub-tasks each of which can be carried out by one processing stage forming a pipeline. If sub-task 1 takes 10 clock cycles, subtask 2 takes 5 clock cycles, sub-task 3 takes 8 clock cycles, sub-task 4 takes 10 clock cycles and sub-task 5 takes 12 clock cycles, how many instructions should be executed so that the subtask approach will be more cost-effective?

Answer Each sub-unit can only process one instruction at each time 45+(x-1)*12 <= 25*x So 2. 53<= x If the number of instruction is 3 then the sub-task approach is faster

Pre-fetch Is pre-fetching in a buffet dinner exactly the same as the pre-fetching mechanism in a microprocessor? The plate is equivalent to which component? Is a bigger plate is better than a small plate?

Components in the BIU is to read/write the memory What is needed to access the memory? ? ? We need to generate an address and read/write the data BIU contains a dedicated adder ( ) which is used to generate the physical address of the memory location Address is formed by adding an appended 16 -bit segment address and a 16 -bit offset address

Inside BIU Example: the physical address of the next instruction to be fetched is formed by combining the current contents of the code segment (CS) register (16 -bit) and the current contents of the instruction pointer (IP) register (16 -bit) Register is a hardware device to store the binary data so 16 -bit register is a device that can store a 16 -bit data

Generating the physical address If CS (code segment) is 1005 H The IP is 5555 H What is the physical address? (how to determine the physical address? ) Point to consider: the address bus of the 8086 is 20 -bit, the registers are 16 -bit. Is it a problem? ? Consider the sum of two 16 -bit values, what is the max. integer value represented by 16 -bit. What is the sum and how many bits are required to represent the sum?

Segment concept 8086 can support up to 1 M memory Memory is divided into segments Each segment is 64 K To access data inside a segment, we need to know the base address of a segment as well as the offset. This is similar to an address. You live it flat 11 B so 11 is the base and B is the offset.

Segment in 8086 Why segment mechanism is needed in 8086? n The address bus size (20 -bit) > register size (16 -bit) n n n Example: if the address bus is 4 bits then you can access 16 locations If you can only output a 2 -bit address from your register then what will happen? Save components – can reduce the size of the registers A segment is a 64 Kbyte memory block

Segment concept Using numbers in base 10, with only 1 digit then the max. is 9 so adding 9+9 = 18 Your result has 2 digits but the result cannot represent values from 19 to 99 So this is a major problem if only add two 16 -bit values then you can get a 17 -bit result but you want a 20 -bit result to represent an address!

Segment concept Segment analogy It is similar to a hotel If you stay in hotel room number 1234 what does it imply? n does it mean that the hotel has 1234 rooms?

Segment concept A 64 k segment 1 M How can we access locations within a segment ? ? ? A 64 k segment

Segment Registers The segment registers are used for accessing the memory The 8086 address space is segmented into 64 K-byte segments and just four segments can be active at a time. Because there are only 4 segment registers In theory, how many segments can we have? ? n Total memory 1 M and segment is 64 K so 1 M/64 K number of segment

The Segment concept Real address Offset So the real address (physical address) is = Base address (20 bits) + offset (16 -bit) The Base address must be Memory divisible by 16 so the last digit (a segment) is equal to 0 and the ‘ 0’ is not stored so a 16 -bit register can hold the rest of the address Base address (segment address) If base is 12340 H and offset is 99 H then the real address 123 D 9 H

Segment concept For example: FFFFEH is not divisible by 16 n FFFF 0 H is divisible by 16 n 12340 H is also divisible by 16 n

Segment concept The maximum value of a 16 -bit value is FFFF (Hex), if two 16 -bit values added together, such as FFFF (segment) + FFFF (offset), the result is 1 FFFE (Hex) (physical) and it is only a 17 -bit value and values from 20000 H to FFFFFH cannot be produced. So in 8086, you cannot randomly assign a segment. The segment address must satisfy one condition, that is the base address must be divisible by 16. If a value is divisible by 16 and if we are using HEX (base 16) as the number system then the last digit of the value must be a ‘ 0’. For example, the value in the segment register is 1234 H and the offset is 20 H then the physical address is 12340 H + 20 H = 12360 H.

Segment concept The segment concept analogy n If you are design the elevator for a very tall building, for example with 100 levels. How are you going to arrange the buttons if the elevator is able to reach all levels?

Execution Unit (EU) The EU is responsible for decoding and executing all instructions. What is decoding ? The EU will see data such as 8 B C 3 (10001011 11010011) Decoding is to carry out the properation according to the binary string (10001011 11010011) 8 B C 3 is (MOV AX, BX) After decoding, EU will perform the move (MOV) operation

Decoding Instruction 10101100 Decoder Control signals

Decoding Control signals are generated for each execution step based on the instruction. These signals is usually determined by a program stored in a special memory. The control program is called a microprogram The microprogram is stored on the processor chip in a small and fast memory called the microprogram memory or the control store.

Decoding

Decoding Microprogrammed control consists of a microinstruction address generator, which generates the address to be used for reading microinstructions from the control store. The address generator uses a microprogram counter, u. PC, to keep track of control store addresses when reading microinstructions from successive locations.

Decoding Microprogrammed control can be viewed as having a control processor within the main processor. Microinstructions are fetched and executed much like machine instructions. Their function is to direct the actions of the main processor’s hardware components, by indicating which control signals need to be active during each execution step.

Execution Unit (EU) (Cont’d) EU consists of an ALU (Arithmetic and Logic Unit), status and control flags, eight generalpurpose registers, temporary registers, and queue control logic The EU extracts instructions from the top of the queue in the BIU, decodes them, generates operand addresses if necessary, passes them to the BIU and requests it to perform the read or write bus cycles to memory or I/O, and performs the operation specified by the instruction on the operands. During execution of the instruction, the EU tests the status and control flags and updates them based on the results of executing the instruction.

Functions of EU ADD AX, 16 ; meaning add 16 to AX Where AX is a register inside the CPU If AX is 20 then after the operation it becomes 36 For the above operation, do we need to fetch operand from memory? 16 in the above operation is called an immediate Immediate values are stored as part of an instruction and fetched together with the instruction Now if it is ADD AX, X ; X is a variable Do we need to fetch the operand X from memory?

Functions of EU If the instruction queue is empty, the EU waits for the next instruction byte to be fetched and shifted to the top of the queue. When the EU executes a branch or jump instruction, it transfers control to a location corresponding to another set of sequential instructions. Whenever this happens, the BIU automatically resets the queue and then begins to fetch instructions from this new location to refill the queue.

Jump and branch

Summary What is the pre-fetch concept? What is a pipeline and its advantage? What are functions performed by the BIU and EU What is a multiplexed address/data bus What is the segment concept

8086 Internal Registers are a very important component because they are used as a temporary storage, as well as storing the current status of the CPU. Contents of some registers indicate the memory locations to be fetched. Registers are internal components that we can control with assembly language programming 4 groups of 16 -bit register n n Instruction Pointer (IP) Data Registers (4) Pointers and Index Registers (4) Segment Registers (4) The Flag Register

Instruction Pointer (IP) Identifies the location of the next instruction to be executed in the current code segment IP contains an offset value not the physical address of the next instruction Physical address = IP+CS (code segment register) Every time an instruction word is fetched from memory, the BIU updates the values in IP (eg IP = IP+1) such that it points to the next sequential instruction word in memory

Data Registers 4 general purpose data registers and are used for temporary storage of frequently used intermediate results. This can improve the speed (why? ? ? ) Register can use either as 8 -bit or 16 -bit Accumulator Register (AX: AH AL) AX (16 -bit) Base Register (BX: BH BL) Count Register (CX: CH CL) AH (8 -bit) AL (8 -bit) Data Register (DX: DH DL)

Data Registers The general purpose data registers can be used for arithmetic or logic operations For example, to carry out an addition: add ax, bx The result is stored in ax and it is equal to the sum of values in ax and bx (in C, it is similar to ax+=bx) For string instruction, the CX register is used to store a count value representing the number of bytes to be moved All I/O operations require data that are to be input or output to be in the A register, while register DX holds the address of the I/O port

Segment Registers The segment registers are used for accessing the memory The 8086 address space is segmented into 64 K-byte segments and just four segments can be active at a time. In theory, how many segments can we have? ? The segment registers are used to select the active segments Code Segment (CS) Register n CS identifies the starting address of the 64 -K byte segment known as the code segment. Code segments of memory contain instructions of the program. Data Segment (DS) Register n DS register identifies the starting location of the current data segment in memory. Data is stored in the data segment.

Segment Registers (Cont’d) Stack Segment (SS) Register n SS register contains a logical address that identifies the starting location of the current stack segment in memory. Stack is used for temporary storage Extra Segment (ES) Register n ES register identifies the extra segment usually used for data storage. The segment registers store the base address of a segment. To determine the physical address, an offset is required. The index registers are used to store the offset value.

Pointer and Index Registers Stack Pointer (SP) – permits easy access to locations in the stack segment of memory n The value in SP represents the offset of the next stack location which can be accessed relative to the current address in the stack segment (SS) register, i. e. , always points to the top of the stack. Base Pointer (BP) BP represents an offset from the SS register. However, it is used to access data within the stack segment. n Used in the based addressing mode The applications of the various registers will be discussed in details when we learn assembly language programming n

Pointer and Index Registers Index register are used to hold offset addresses for instructions that access data stored in the data segment of memory. Source Index Register (SI) n SI is used to store an offset address for a source operand under index addressing for string and memory operation. Destination Index Register (DI) n n DI is used for storage of an offset that identifies the location of destination operand also used in some string operations. Remarks: The offset value is always referenced to the value in the data segment (DS) register.

Registers and pointers Segment register Pointer CS (code segment) IP (instruction pointer) DS (data segment) DI, SI SS (Stack segment) SP (stack pointer) BP (base pointer) ES (Extra segment) DI

Flag Register The flag register is a 16 -bit register within the execution unit. The status flags in the register indicate conditions that are produced as the result of executing an arithmetic or logic instruction. What kind of conditions can you think of? ? ?

8086 Flag Register (status flag) C - Carry Bit (set if there is a carryout or borrowin) P - Parity Bit (set if lower byte of the result contains even number of 1 s) – odd parity Z - Zero Bit (set if result after an operation is equal to zero) S - Sign Bit (represent negative value produced during an operation) O - Overflow Bit (result is out of range). If the result of a signed operation is not large enough to be accommodated in a destination register. When an overflow will occur? Give one example The above are the most commonly used flag registers, there are others but not discussed in this subject!!!!!!

Example If our data is only 8 -bit then we do FFH + 1 H = 1 0000 this is a 9 -bit value the ‘ 1’ is the carry!!!! Similarly when we do 00 H – 1 H then result is 1 1111 the 1 is the borrow bit.

Flags consider using 8 -bit values A and B, determine flag status for C, S, Z and O If A = 0 FH, B = 1; A+B n If A = 0, B = 1; A-B n If A = 7 FH, B = 1; A+B n If A = 80, B = 0 F; A-B n If A = FFH, B = 1; A+B n If A = 2 FH, B = 60, C = -1; (A+C) -B n

Block diagram for a simple computer system Display unit LCD memory CPU Get instruction from memory Perform operation Get next instruction I/O

Bus Cycle Bus – address and data Bus cycle is used to access memory, I/O devices, or the interrupt controller. Bus cycle starts with an address being output on the system bus followed by a read or write data transfer. A series of control signals are produced to control the direction and timing of the bus A standard bus cycle consists of 4 clock periods Understand system bus timing will assist you to choose the proper memory device

Bus cycle T 1 : BIU puts an address on the bus T 2: data are put on the bus (for write cycle) T 2: bus in High Z mode (for read cycle) T 3: data on the bus T 4: data on the bus For a 5 MHz system, how long does it take to complete 1 bus cycle? ?

Read cycle

Write cycle

Wait states can be inserted into a bus cycle In response to request by an event in external hardware READY input of 8086 is set LOW As long as READY is held low, wait states (Tw) are inserted between T 3 and T 4 For a write cycle, data maintained on the bus Purpose: to extend the duration of the bus cycle, so slower memory devices can be used

Bus cycle

Example Assembly Language Machine Code MOV AX, 0 F 802 H PUSH AX MOV CX, BX MOV DX, CX ADD AX, [SI] ADD SI, 8086 H JMP $ -14 B 802 F 8 50 8 BCB 8 BD 1 0304 81 C 68680 EBF 0

Instruction sequence

Instruction sequence B 8 02 02 F 8 50

Instruction sequence

Pins definition AD 15 – AD 0 Address/Data Bus A 19/S 6 – A 16/S 3 - Address/Status BHE/S 7 – Bus high enable MN/MX – min. max. mode control RD – read control (read data from memory or I/O) Test – wait on test (input). If test signal is HIGH then processor will be in an idle state Ready – (input) wait state control Reset – system reset (if kept HIGH for 4 clock cycles) NMI – non-maskable interrupt request INTR – interrrupt request CLK – system clock

Pins definition HOLD – hold request (used with DMA) HLDA – hold acknowledge (entered the HOLD state) WR – write control M/IO – memory /IO control DT/R – data transmit receive (to enable external data bus buffer) DEN – data enable ALE – address latch enable INTA – interrupt acknowledge

Pins definition RQ / GT 1, 0 – request / grant bus access control (Used in max. mode, to force the processor to release the local bus at the end of the processor’s current bus cycle) LOCK – bus priority lock control (disable other bus master to gain access of the system bus) /S 2 -/S 0 – bus cycle status (this lines reflect the type of operation being carried out by the processor) QS 1 – QS 0 – instruction queue status (give information about the status of the code-prefetch queue)

Status lines – reflects activities of the BIU /S 2 0 0 1 1 /S 1 0 0 1 1 /S 0 0 1 0 1 Indication Interrupt acknowledge Read I/O port Write I/O port Halt Code access Read memory Write memory passive

Instruction Queue status QS 1 QS 0 Indication 0 0 No operation 0 1 1 First byte of opcode from the queue Empty queue Subsequent byte from the queue

Instruction sequence 100 B 8 02 10 02 F 8 50

Status signals S 6 – S 3 – are output on the bus at the same time that data are transferred over the other bus lines S 4 and S 3 form a 2 -bit binary code that identifies which of the 8086’s internal segment register was used to generate the physical address 00 – extra 01 – stack 10 – code/none 11 – Data S 5 – logic level of the interrupt enable flag S 6 – not used always at 0

Control signals are provided to support memory and I/O interfaces ALE – 0 -> 1 to signal external circuitry when a valid address word is on the bus BHE : 0 used as memory enable for the most significant byte (High Byte) half of the data bus M/IO: 1 represent a memory operation n 0 represent an I/O operation DT/R: 1 bus in transmit mode; 0 in receive mode RD: represent a read cycle and reading data from the bus WR: represent a write cycle and 0 represent valid write or output data are on the bus DEN: signals external devices when they should put data on the bus

Operating modes of 8086 The 8086 can run in two different modes: minimum and maximum In min. mode 8086 provides all the control signals needed to implement the memory I/O interfaces In max. mode, it provides signals (status signals) for implementing a multiprocessor/coprocessor system environment In max. mode, bus controller, bus arbiter are included in the system. The controller derives the control signals based on the status signals

Max. mode application

Maximum mode The basic functions of the bus controller chip (8288) is to derive control signals like /RD, /WR, /DEN, DT/R, ALE based on the status lines /IORC, /IOWC – I/O read/write command signals. They enable an IO interface to read or write data from or to the addressed port. /MRDC, /MWTC – memory read and write command signals. For instructing memory to accept or send data from or to the bus. /AIOWC, /AMWTC – advanced /IOWC and /MWTC. Serve the same purpose as /IOWC or /MWTC but are activated one clock cycle earlier.

Maximum mode circuit

Minimum mode circuit Latch – store Transceiver – transmit + receive

Read cycle Why there are two address? Output From address latch Output from multiplexed Address/data bus

Memory To access the memory both address and data must be available at the same time. But for 8086, the bus is multiplexed so Physically it is not possible to supply both information simultaneously So address latches are used to hold the address information when the bus is used to transceive data Data memory Address

Modern microprocessor architecture Based on your understanding of the 8086 microprocessor, what features of the 8086 are needed to be modified in order to make it more powerful? What are the major differences between a modern microprocessor and a 8086? ? ?

Modern microprocessors The following features are crucial to the performance: Operating speed (clock rate) n Memory (size and speed) n Data size n Floating point processing n Overlapping of execution and memory access n Perform more tasks in a single cycle n

Modern microprocessor Operating speed n The 8086 operates at 5 MHz n The latest Intel microprocessor (i 5, i 7, i 9) can operate at the 2. 6 GHz range n The operating frequency is governed by the fabrication techniques (0. 13 Micron technology for P 4 and the 8086 is based on 3 Micron technology). With submicron technology we can put more components into the chip n The higher the operating speed, more heat will be generated and cooling the CPU becomes more important. No cooling is for the CPU is required in the 486 era

Modern microprocessor Memory n n n The 8086 has 20 -bit address bus and the max. memory location is only 1 MBytes The Pentium II can access up to 64 GBytes (what is the size of the address bus? ? ? ) of memory The speed of memory chip has also increased in the last 20 years The RDRAM can operate at 600 MHz or 800 MHz What is the advantage(s) of having more memory? ? ? Using 8086 as an example, memory read/write must complete in 4 cycles so if the memory is slow then the memory read/write cannot be completed in 4 cycles.

Memory speed To improve the performance, internal high-speed memory is provided for the storage of data as well as instruction. These internal memory is called cache n There are cache inside the CPU and it is called Level 1 (L 1) cache (there at least 8 Kb in P 4) n The cache located between the CPU and the external memory is called Level 2 (L 2) cache (speed of cache is higher then the traditional memory) (512 K in P 4) n In newer microprocessors design, L 2 cache is also located inside the CPU so therefore, L 3 cache is also provided. L 3 cache is external just like L 2 cache in the old days.

Cache CPU Memory L 1 L 2

Cache n Accessing data/instruction from cache is faster that accessing the traditional memory so when executing a program, if the entire program is already stored in the cache then the overall performance can be improved

Cache is fast Access time for DRAM could be 60 ns while for the cache it could be 15 ns n to access cache memory the CPU does not have to use motherboard’s system bus (also slow) for data transfer. n

Why Cache? The effectiveness of cache is based on a property of computer program called locality of reference. Most of a program’s execution time is spent in routines in which many instructions are executed repeatedly, such as looping, functions etc.

Cache Therefore, many instructions in localized areas of the program are executed repeatedly during some time period. So recently executed instruction is likely to be executed again very soon and instruction close to a recently executed instruction are also likely to be executed soon.

Cache Whenever an information item (data or instruction) is first needed, this item should be brought to the cache because it is likely to be needed again soon. Instead of fetching just one item from the main memory it is useful to fetch several items that are located at adjacent addresses as well.

Cache The cache is usually divided into smaller units, called blocks, lines or slots. In 486, the 8 K cache is divided into 4 2 K blocks. Each block is 16 bytes x 128 rows The 16 -byte row is divided into 4 -byte lines. Any of the 4 lines cannot be accessed partially. During operations, the microprocessor always check the cache for data and instruction first. If data is already stored in the cache – it is called a cache hit Usually a mechanism called paging is employed to write frequently used data and instruction into the cache If the data is not found then – cache miss

Operation of cache The processor does not need to know the existence of the cache. It simply issues Read and Write requests using addresses that refer to locations in the memory. The cache control circuitry determines whether the requested word currently exists in the cache. If it does, the Read or Write operation is performed on the appropriate cache location. This is called a cache hit. The main memory is not involved when there is a cache hit in a Read operation. For a write operation, there two techniques: writethrough and write-back.

Cache write In a write-through, both the cache and the main memory are updated. In write-back, only the cache location is updated and a flag (a dirty or modified bit) is used to reflect the change in status. The main memory location is updated later, when the block containing this marked word is removed from the cache to make room for a new block.

Cache structure In order to identify if data from main memory is stored in a cache slot, a tag is provided. A tag contains information related to the address of the memory being stored in the cache. The size of the tag is address bus size – lg N where N is the number of bytes in the data part of the cache slot. For example if the address bus is 20 bits and the cache slot is 32 bytes then the tag is 20 -5 = 15 bits. If the memory data is stored in the cache then the upper 15 bits of the address should map with the tag value of the cache.

Structure of cache If the tag is 091 AH then that means the memory addresses 12340 H to 1235 FH are stored in the cache (if the cache block is 32 bytes)

Cache structure The V bit is the valid bit indicating whether the slot holds valid date. If V=1, then the data is valid. If V=0, then data is not valid. Initially, it is invalid. Once data is placed in the slot it is valid. The D bit is the dirty bit. This bit only has meaning if V=1. This indicates that the data in the slot has been modified (written to) or not. If D=1, data has been modified since being in the cache. If D=0, then the data is the same as it was when it first entered the cache.

Mapping between the cache and external memory Size of the cache (KB) is much smaller than external memory (in GB). Usually a mapping mechanism is applied so that external memory can be mapped to the cache. The most popular mapping mechanism is called n-way set associative. The cache is updated a page (or a line) at each time, usually 64 bytes. For example, if the cache has 512 KB then there a total of 8192 pages. The 8192 pages will be divided into blocks based on the value “n”. If n is 4 then 8192/4 = 2048 blocks.

Mapping between the cache and external memory Based on the number of blocks, the external memory will also be divided into the same number of blocks. For example, if the external memory is 1 GBytes then each block of the external memory is equal to 512 Kbytes. (1 G = 230) and each block of the external memory will be mapped to 1 block of the cache. So every 4 lines of the cache will be in charge (used to store) of 512 Kbytes of the external memory.

4 -way set associative mapping

Example of 4 way set associative If the CPU has a 32 -bit address and the cache is 128 slot and 32 bytes per slot. Using 8 -way set associative then 8 slots per set, there should be 16 sets (128/8). The tag size of each cache slot is 32 – 5 = 27 bits. Since the cache is now divided into sets so 4 bits are needed to represent the set number. Tag (27 -4 = 23 bits) & Set 4 bits The memory address is 32 bits and interpreted in the following: Tag (23 bits) +Set 4 bits + Offset (5 bits)

Example If address issued is 12345678 H (32 -bit) If the cache slot is 32 bytes Then the set number from the above is 3 And tag number is 091 A 2 BH

Replacement algorithms When a new block is to be brought into the cache and all the positions that it may occupy are full, the cache controller must decide which of the old blocks to overwrite. It should keep blocks in the cache that are likely to be reference in the near future. A direct approach is to overwrite the one that has gone the longest time without being referenced. This block is called the least recently used (LRU) block, and this is called the LRU replacement algorithm. The cache controller must track references to all blocks as computation proceeds.

Modern Microprocessor Data The 8086 has a 16 -bit data bus n The P 4 has a 64 -bit data bus so can get more data in a single read/write cycle n Or processing of high-precision data can be more effective n Consider this: what is the maximum value can be represented by a 16 -bit and a 32 -bit pattern respectively n

Overlapping of operations In 8086 the instruction pipeline enables the overlapping of execution of instruction and fetching In Pentium Pro processor which is a superscalar architecture – can execute multiple instructions concurrently Two integer pipelines U and V, each one is 5 -stage One floating-point execution unit

Overlapping of operations So it is possible to execute three instructions simultaneously (with proper programming the speedup can be up to 40%)

Pentium Processor Pentium processor issues 2 instructions in parallel to the 2 independent integer pipelines (U and V) This enhances the speed of integer arithmetic In the prefetch stage, the CPU fetches instructions from the instruction cache In Write Back, the CPU updates registers’ contents or the status in the flag register depending upon the execution result In Pentium processor only one clock cycle to decode the instruction, comparing to 486, it takes 2 clock cycles.

Block diagram of Pentium processor Can you see the pipeline in the above?

Block diagram of U-V pipe for Pentium processor The pipeline of 8086 has how many stages?

The 8086 instruction execution sequence Fetch instruction Decode and determine operand address Fetch operand Execute Basic operations performed by a CPU

Branch prediction As in 8086, when a “jump” operation is being performed then the instruction queue must be reset and therefore, instruction already pre-fetched is discarded. If you can predict when a branch or jump instruction will take place then you can pre-fetch from the new “jump” point implying that the instructions being pre-fetched are always relevant to the instruction to be executed.

Branch prediction The cache controller of modern CPUs analyze the memory block it loaded and whenever it finds a JMP instruction in there it will load the memory block for that position in the L 2 memory cache before the CPU reaches that JMP instruction. In a conditional statement such as if a =< b go to address 1, or if a > b go to address 2. This would make a cache miss, because the values of a and b are unknown and the cache controller would be looking only for JMP-like instructions.

Branch prediction The cache controller loads both conditions into the memory cache. Later, when the CPU processes the branching instruction, it will simply discard the one that wasn’t chosen. It is better to load the memory cache with unnecessary data than directly accessing the RAM memory.

Dynamic branch prediction The processor hardware assesses the likelihood of a given branch taken by keeping track of branch decisions every time that a branch instruction is executed. A dynamic branch prediction algorithm can use the result of the most recent execution of a branch instruction. The processor assumes that the next time the instruction is executed, the branch decision is likely to be the same as the last time. There will be two states: LT (branch is likely to be taken); LNT (branch is likely not to be taken)

Dynamic branch prediction Suppose it starts at LNT, when the branch instruction is executed and the branch is taken, the machine moves to state LT. Otherwise, it remains in state LNT. The next time the same instruction is encountered, the branch is predicted as taken if it is in state LT. Otherwise it is predicted as not taken.

Dynamic branch prediction Once a loop is entered, the decision for the branch instruction that controls looping will always be the same except for the last pass through the loop. Hence, each prediction for the branch instruction will be correct except in the last pass. The prediction in the last pass will be incorrect, and the branch history state will be changed to the opposite state. Therefore, if the same loop is being executed again the prediction will be wrong.

Dynamic branch prediction BT BNT LT BNT BT Consider a Do { }While (condition) loop

Enhanced dynamic branch prediction BT SNT BNT LNT BNT ST – strongly likely to be taken SNT – strongly likely not to be taken BT BT LT ST BNT BT

Modern microprocessor Floating point hardware Do you know how floating-point number is represented in a computer? The floating-point hardware is an important design feature in modern processor No floating-point hardware is available for 8086 For Intel, the co-processor mechanism was used until the emerge of the 486 microprocessors

Perform more tasks in a single cycle MMX – multimedia extension It is for improving 2 -D , 3 -D multimedia applications Using MMX instruction set, eight image elements (8 -bit) can be loaded simultaneously and perform concurrent operation on them The MMX instruction makes use of the 64 -bit registers (the mantissa portion of the floating-point registers) MMX instructions are mainly integer operations

MMX /SSE mechanism d 1 d 2 d 3 d 4 Operation d 5 d 6 d 7 128 -bit register can Store 4 floating point value d 8 result r 1 r 2 r 3 r 4 4 results can be obtained in one single operation

Points about P 4 More features to support parallel processing – hyper-threading technology, SSE, MMX instructions, deeper pipeline, dual-core Still compatible with existing Intel architecture based software Operate at very high frequency up to 3 GHz With 8 KB data cache and 12 K instruction cache Integer ALUs operate at twice the core processor frequency

Self-test How to make a microprocessor to become more powerful! What is the most important hardware feature embedded in a 8086 microprocessor? What are the basic operations being performed when a computer is running? What is the maximum value represented by a 20 -bit pattern? What is the maximum value represented by the sum of two 16 -bit pattern? What is the major function of a ALU? What is the major functions of a BIU? Can you use a block diagram to represent a memory? What is an instruction pipeline and why it can improve the overall performance of a microprocessor?

Block diagram of Intel P 4 CPU Can you spot any features come from the 8086 ?

Introduction to microcontroller If you want to develop a simple robot are you going to use a P 5 microprocessor in your system? Usually to implement a simple system, microcontroller is used instead of a microprocessor. A microcontroller can be regarded as an all-in-one device with a CPU, memory, Input/Output interfaces all included in a single package. Commonly used microcontrollers include the 8051 series, Basic Stamp, Basic. X etc. In addition, the cost of a microcontroller is usually lower than the traditional microprocessor, for example an 8051 is only HKD 25.

Microcontroller Some microcontrollers also come with additional features such as ADC (analog to digital converter), DAC (digital to analog converter) and PWM (pulse-width modulation) output. All those features are very useful for the implementation of basic robotic or control systems.

The ADu. C 832 device The ADu. C 832 is a powerful microcontroller and is being used in the experimental setup. Basic features of the device include: n 8 -channel 12 -bit ADC n 2 12 -bit DAC n 62 Kbytes Program memory, 4 k. Bytes data memory n 2304 bytes on-chip data RAM n dual PWM output (the PWM can be used to control an analog device) n 8051 compatible instruction set n 4 8 -bit Input/Output ports

Block diagram of ADu. C 832

Application using microcontroller The components can be connected directly to the microcontroller via the I/O ports without other supporting devices. Sensor Motor ADu. C 832