IT 252 Computer Organization and Architecture Number Representation

IT 252 Computer Organization and Architecture Number Representation Chia-Chi Teng

Where Are We CNow? program: foo. c Compiler Assembly program: foo. s CS 142 & 124 Assembler Object (mach lang module): foo. o Linker lib. o Executable (mach lang pgm): a. out IT 344 Loader Memory

Review (do you remember from 124/104? ) • 8 bit signed 2’s complement binary # -> decimal # • 0111 1111 = ? • 1000 0000 = ? • 1111 = ? • Decimal # -> 8 bit signed 2’s complement binary # • 32 = ? • -2 = ? • 200 = ?

Decimal Numbers: Base 10 Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Example: 3271 = (3 x 103) + (2 x 102) + (7 x 101) + (1 x 100)

Numbers: positional notation • Number Base B B symbols per digit: • Base 10 (Decimal): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Base 2 (Binary): 0, 1 • Number representation: • d 31 d 30. . . d 1 d 0 is a 32 digit number • value = d 31 B 31 + d 30 B 30 +. . . + d 1 B 1 + d 0 B 0 • Binary: 0, 1 (In binary digits called “bits”) • 0 b 11010 = 1 24 + 1 23 + 0 22 + 1 21 + 0 20 = 16 + 8 + 2 #s often written = 26 0 b… • Here 5 digit binary # turns into a 2 digit decimal # • Can we find a base that converts to binary easily?

Hexadecimal Numbers: Base 16 • Hexadecimal: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F • Normal digits + 6 more from the alphabet • In C, written as 0 x… (e. g. , 0 x. FAB 5) • Conversion: Binary Hex • 1 hex digit represents 16 decimal values • 4 binary digits represent 16 decimal values Þ 1 hex digit replaces 4 binary digits • One hex digit is a “nibble”. Two is a “byte” • 2 bits is a “half-nibble”. Shave and a haircut… • Example: • 1010 1100 0011 (binary) = 0 x_____ ?

Decimal vs. Hexadecimal vs. Binary Examples: 1010 1100 0011 (binary) = 0 x. AC 3 10111 (binary) = 0001 0111 (binary) = 0 x 17 0 x 3 F 9 = 11 1111 1001 (binary) How do we convert between hex and Decimal? MEMORIZE! 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 A B C D E F 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Precision and Accuracy Don’t confuse these two terms! Precision is a count of the number bits in a computer word used to represent a value. Accuracy is a measure of the difference between the actual value of a number and its computer representation. High precision permits high accuracy but doesn’t guarantee it. It is possible to have high precision but low accuracy. Example: float pi = 3. 14; pi will be represented using all bits of the significant (highly precise), but is only an approximation (not accurate).

What to do with representations of numbers? • Just what we do with numbers! • Add them • Subtract them • Multiply them • Divide them • Compare them • Example: 10 + 7 = 17 + 1 1 1 0 0 1 1 1 ------------1 0 0 0 1 • …so simple to add in binary that we can build circuits to do it! • subtraction just as you would in decimal • Comparison: How do you tell if X > Y ?

Visualizing (Mathematical) Integer Addition • Integer Addition Add 4(u , v) • 4 -bit integers u, v • Compute true sum Add 4(u , v) • Values increase linearly with u and v • Forms planar surface v u

Visualizing Unsigned Addition Overflow • Wraps Around • If true sum ≥ 2 w • At most once UAdd 4(u , v) True Sum 2 w+1 Overflow 2 w 0 Modular Sum v u

BIG IDEA: Bits can represent anything!! • Characters? • 26 letters 5 bits (25 = 32) • upper/lower case + punctuation 7 bits (in 8) (“ASCII”) • standard code to cover all the world’s languages 8, 16, 32 bits (“Unicode”) www. unicode. com • Logical values? • 0 False, 1 True • colors ? Ex: Red (00) Green (01) Blue (11) • locations / addresses? commands? • MEMORIZE: N bits at most 2 N things

How to Represent Negative Numbers? • So far, unsigned numbers • Obvious solution: define leftmost bit to be sign! • 0 +, 1 – • Rest of bits can be numerical value of number • Representation called sign and magnitude • x 86 uses 32 -bit integers. +1 ten would be: 0000 0000 0001 • And – 1 ten in sign and magnitude would be: 1000 0000 0000 0001

Shortcomings of sign and magnitude? • Arithmetic circuit complicated • Special steps depending whether signs are the same or not • Also, two zeros • 0 x 0000 = +0 ten • 0 x 80000000 = – 0 ten • What would two 0 s mean for programming? • Therefore sign and magnitude abandoned

Another try: complement the bits • Example: 710 = 001112 – 710 = 110002 • Called One’s Complement • Note: positive numbers have leading 0 s, negative numbers have leadings 1 s. 000001. . . 01111 10000. . . 11110 11111 • What is -00000 ? Answer: 11111 • How many positive numbers in N bits? • How many negative numbers?

Standard Negative Number Representation • What is result for unsigned numbers if tried to subtract large number from a small one? • Would try to borrow from string of leading 0 s, so result would have a string of leading 1 s § 3 - 4 00… 0011 – 00… 0100 = 11… 1111 • With no obvious better alternative, pick representation that made the hardware simple • As with sign and magnitude, leading 0 s positive, leading 1 s negative § 000000. . . xxx is ≥ 0, 111111. . . xxx is < 0 § except 1… 1111 is -1, not -0 (as in sign & mag. ) • This representation is Two’s Complement

2’s Complement Number “line”: N = 5 000001 11110 00010 -1 0 1 11101 2 -2 -3 11100 -4. . . • 2 N-1 nonnegatives • 2 N-1 negatives • one zero • how many positives? -15 -16 15 10001 10000 01111 00000 10000. . . 11110 11111 00001. . . 01111

Numeric Ranges • Unsigned Values • UMin = 0 • Two’s Complement Values • TMin = – 2 w– 1 000… 0 • UMax = 2 w – 1 111… 1 100… 0 • TMax = 011… 1 • Other Values • Minus 1 111… 1 Values for W = 16 2 w– 1

Values for Different Word Sizes • Observations • |TMin | = TMax + 1 § Asymmetric range • UMax = 2 * TMax + 1 ¢ C Programming § #include <limits. h> § Declares constants, e. g. , § ULONG_MAX § LONG_MIN § Values platform specific

Unsigned & Signed Numeric Values X 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 B 2 U(X) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 B 2 T(X) 0 1 2 3 4 5 6 7 – 8 – 7 – 6 – 5 – 4 – 3 – 2 – 1 • Equivalence • Same encodings for nonnegative values • Uniqueness • Every bit pattern represents unique integer value • Each representable integer has unique bit encoding • Can Invert Mappings • U 2 B(x) = B 2 U-1(x) § Bit pattern for unsigned integer • T 2 B(x) = B 2 T-1(x) § Bit pattern for two’s comp integer

Two’s Complement Formula • Can represent positive and negative numbers in terms of the bit value times a power of 2: d 31 x -(231) + d 30 x 230 +. . . + d 2 x 22 + d 1 x 21 + d 0 x 20 • Example: 1101 two = 1 x-(23) + 1 x 22 + 0 x 21 + 1 x 20 = -23 + 22 + 0 + 20 = -8 + 4 + 0 + 1 = -8 + 5 = -3 ten

Two’s Complement shortcut: Negation *Check out www. cs. berkeley. edu/~dsw/twos_complement. html • Change every 0 to 1 and 1 to 0 (invert or complement), then add 1 to the result • Proof*: Sum of number and its (one’s) complement must be 111. . . 111 two However, 111. . . 111 two= -1 ten Let x’ one’s complement representation of x Then x + x’ = -1 x + x’ + 1 = 0 -x = x’ + 1 • Example: -3 to +3 to -3 x : 1111 1111 1101 two x’: 0000 0000 0010 two +1: 0000 0000 0011 two ()’: 1111 1111 1100 two +1: 1111 1111 1101 two You should be able to do this in your head…

What if too big? • Binary bit patterns above are simply representatives of numbers. Strictly speaking they are called “numerals”. • Numbers really have an number of digits • with almost all being same (00… 0 or 11… 1) except for a few of the rightmost digits • Just don’t normally show leading digits • If result of add (or -, *, / ) cannot be represented by these rightmost HW bits, overflow is said to have occurred. 000001 00010 unsigned 11110 11111

Peer Instruction Question X = 1111 1110 1100 two Y = 0011 1010 0000 two A. X > Y (if signed) B. X > Y (if unsigned) C. X = -19 (if signed) 0: 1: 2: 3: 4: 5: 6: 7: ABC FFF FFT FTF FTT TFF TFT TTF TTT

Peer Instruction Question A: False (X negative) B: True C: False(X = -20) X = 1111 1110 1100 two Y = 0011 1010 0000 two A. X > Y (if signed) B. X > Y (if unsigned) C. X = -19 (if signed) 0: 1: 2: 3: 4: 5: 6: 7: ABC FFF FFT FTF FTT TFF TFT TTF TTT

Number summary. . . META: We often make design decisions to make HW simple • We represent “things” in computers as particular bit patterns: N bits 2 N things • Decimal for human calculations, binary for computers, hex to write binary more easily • 1’s complement - mostly abandoned 000001. . . 01111 10000. . . 11110 11111 • 2’s complement universal in computing: cannot avoid, so learn 000001. . . 01111 10000. . . 11110 11111 • Overflow: numbers ; computers finite, errors!

Information units • Basic unit is the bit (has value 0 or 1) • Bits are grouped together in units and operated on together: • Byte = 8 bits • Word = 4 bytes • Double word = 2 words • etc.

Encoding Byte Values • Byte = 8 bits • Binary • Decimal: § 00002 010 to 11112 First digit must not be 0 in C • Hexadecimal § § § to 25510 0016 to FF 16 Base 16 number representation Use characters ‘ 0’ to ‘ 9’ and ‘A’ to ‘F’ Write FA 1 D 37 B 16 in C as 0 x. FA 1 D 37 B • Or 0 xfa 1 d 37 b al y im ar c x n He De Bi 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 A 10 1010 B 11 1011 C 12 1100 D 13 1101 E 14 1110 F 15 1111

Memory addressing • Memory is an array of information units – Each unit has the same size – Each unit has its own address – Address of an unit and contents of the unit at that address are different 0 1 2 address 123 -17 0 contents

Addressing • In most of today’s computers, the basic unit that can be addressed is a byte. (how many bit is a byte? ) – x 86 (and pretty much all CPU today) is byte addressable • The address space is the set of all memory units that a program can reference – The address space is usually tied to the length of the registers – x 86 has 32 -bit registers. Hence its address space is 4 G bytes – Older micros (minis) had 16 -bit registers, hence 64 KB address space (too small) – Some current (Alpha, Itanium, Sparc, Altheon) machines have 64 -bit registers, hence an enormous address space

Machine Words • Machine Has “Word Size” • Nominal size of integer-valued data § Including addresses • Many current machines still use 32 bits (4 bytes) words § § Limits addresses to 4 GB Becoming too small for memory-intensive applications • New or high-end systems use 64 bits (8 bytes) words § § Potential address space 1. 8 X 1019 bytes x 86 -64 machines support 48 -bit addresses: 256 Terabytes • Machines support multiple data formats § § Fractions or multiples of word size Always integral number of bytes

Addressing words • Although machines are byte-addressable, 4 byte integers are the most commonly used units • Every 32 -bit integer starts at an address divisible by 4 int at address 0 int at address 4 int at address 8

Word-Oriented Memory Organization 32 -bit Words • Addresses Specify Byte Locations • Address of first byte in word • Addresses of successive words differ by 4 (32 -bit) or 8 (64 -bit) Addr = 0000 ? ? Addr = 0004 ? ? Addr = 0008 ? ? Addr = 0012 ? ? 64 -bit Words Addr = 0000 ? ? Addr = 0008 ? ? Bytes Addr. 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015

Data Representations • Sizes of C Objects (in Bytes) • C Data Type. Typical 32 -bit § § § § § Intel IA 32 x 86 -64 char short int long float double 1 2 4 4 8 4 8 1 2 4 8 8 4 8 long double 8 10/12 10/16 char * 4 4 8 • Or any other pointer

Byte Ordering How should bytes within multi-byte word be ordered in memory? Conventions n. Big Endian: Sun, PPC Mac, Internet l Least significant byte has highest address n. Little Endian: x 86 l Least significant byte has lowest address

Byte Ordering Example • Big Endian • Least significant byte has highest address • Big End First • Little Endian • Least significant byte has lowest address • Little End First • Example • Variable x has 4 -byte representation 0 x 01234567 • Address given by &x is 0 x 100 Big Endian Little Endian 0 x 100 0 x 101 0 x 102 0 x 103 01 23 45 45 0 x 100 0 x 101 0 x 102 0 x 103 67 45 23 01 67

Big-endian vs. little-endian • Byte order within a word: Value of Word #0 3 2 1 0 Little-endian (we’ll use this) 0 1 2 3 Big-endian Memory address 0 byte 1 2 3 Word #0

Reading Byte-Reversed Listings • Disassembly • Text representation of binary machine code • Generated by program that reads the machine code • Example Fragment Address 8048365: 8048366: 804836 c: Instruction Code 5 b 81 c 3 ab 12 00 00 83 bb 28 00 00 Assembly Rendition pop %ebx add $0 x 12 ab, %ebx cmpl $0 x 0, 0 x 28(%ebx) Deciphering Numbers n n Value: Pad to 32 bits: Split into bytes: Reverse: 0 x 12 ab 0 x 000012 ab 00 00 12 ab ab 12 00 00

Examining Data Representations Code to Print Byte Representation of Data n. Casting pointer to unsigned char * creates byte array typedef unsigned char *pointer; void show_bytes(pointer start, int len) { int i; for (i = 0; i < len; i++) printf("0 x%pt 0 x%. 2 xn", start+i, start[i]); printf("n"); } Printf directives: %p: Print pointer %x: Print Hexadecimal

show_bytes Execution Example int a = 15213; printf("int a = 15213; n"); show_bytes((pointer) &a, sizeof(int)); Result (Linux): int a = 15213; 0 x 11 ffffcb 8 0 x 6 d 0 x 11 ffffcb 9 0 x 3 b 0 x 11 ffffcba 0 x 00 0 x 11 ffffcbb 0 x 00

Representing & Manipulating Sets • Representation • Width w bit vector represents subsets of {0, …, w– 1} • aj = 1 if j A 01101001 { 0, 3, 5, 6 } 76543210 0101 { 0, 2, 4, 6 } 76543210 • Operations • & Intersection 01000001 { 0, 6 } • | Union 01111101 { 0, 2, 3, 4, 5, 6 } • ^ Symmetric difference 00111100 { 2, 3, 4, 5 } • ~ Complement 1010 { 1, 3, 5, 7 }

Bit-Level Operations in C • Operations &, |, ~, ^ Available in C • Apply to any “integral” data type § long, int, short, char, unsigned • View arguments as bit vectors • Arguments applied bit-wise • Examples (Char data type) • ~0 x 41 --> 0 x. BE ~010000012 • ~0 x 00 --> 101111102 ~00002 --> 11112 0 x. FF • 0 x 69 & 0 x 55 --> 0 x 41 011010012 & 01012 --> 010000012 • 0 x 69 | 0 x 55 --> 0 x 7 D 011010012 | 01012 --> 011111012

Contrast: Logic Operations in C • Contrast to Logical Operators • &&, ||, ! § § View 0 as “False” Anything nonzero as “True” Always return 0 or 1 Early termination • Examples (char data type) • !0 x 41 --> • !0 x 00 --> • !!0 x 41 --> 0 x 00 0 x 01 • 0 x 69 && 0 x 55 --> 0 x 01 • 0 x 69 || 0 x 55 --> 0 x 01 • p && *p (avoids null pointer access)

Shift Operations • Left Shift: x << y • Shift bit-vector x left y positions • § Throw away extra bits on left Fill with 0’s on right • Right Shift: x >> y • Shift bit-vector x right y positions § Fill with 0’s on left • Arithmetic shift § << 3 00010000 Log. >> 2 00011000 Arith. >> 2 00011000 Throw away extra bits on right • Logical shift § Argument x 01100010 Replicate most significant bit on right • Undefined Behavior • Shift amount < 0 or word size Argument x 10100010 << 3 00010000 Log. >> 2 00101000 Arith. >> 2 11101000

The CPU - Instruction Execution Cycle • The CPU executes a program by repeatedly following this cycle 1. Fetch the next instruction, say instruction i 2. Execute instruction i 3. Compute address of the next instruction, say j 4. Go back to step 1 • Of course we’ll optimize this but it’s the basic concept

What’s in an instruction? • An instruction tells the CPU – the operation to be performed via the OPCODE – where to find the operands (source and destination) • For a given instruction, the ISA specifies – what the OPCODE means (semantics) – how many operands are required and their types, sizes etc. (syntax) • Operand is either – register (integer, floating-point, PC) – a memory address – a constant

Reference slides You ARE responsible for the material on these slides (they’re just taken from the reading anyway) ; we’ve moved them to the end and off-stage to give more breathing room to lecture!

Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta physics. nist. gov/cuu/Units/binary. html • Common use prefixes (all SI, except K [= k in SI]) Name Abbr Factor SI size Kilo K 210 = 1, 024 103 = 1, 000 Mega M 220 = 1, 048, 576 106 = 1, 000 Giga G 230 = 1, 073, 741, 824 109 = 1, 000, 000 Tera T 240 = 1, 099, 511, 627, 776 1012 = 1, 000, 000 Peta P 250 = 1, 125, 899, 906, 842, 624 1015 = 1, 000, 000 Exa E 260 = 1, 152, 921, 504, 606, 846, 976 1018 = 1, 000, 000 Zetta Z 270 = 1, 180, 591, 620, 717, 411, 303, 424 1021 = 1, 000, 000, 000 Yotta Y 280 = 1, 208, 925, 819, 614, 629, 174, 706, 176 1024 = 1, 000, 000, 000 • Confusing! Common usage of “kilobyte” means 1024 bytes, but the “correct” SI value is 1000 bytes • Hard Disk manufacturers & Telecommunications are the only computing groups that use SI factors, so what is advertised as a 30 GB drive will actually only hold about 28 x 230 bytes, and a 1 Mbit/s connection transfers 106 bps.

kibi, mebi, gibi, tebi, pebi, exbi, zebi, yobi en. wikipedia. org/wiki/Binary_prefix • New IEC Standard Prefixes [only to exbi officially] Name Abbr Factor kibi Ki 210 = 1, 024 mebi Mi 220 = 1, 048, 576 gibi Gi 230 = 1, 073, 741, 824 tebi Ti 240 = 1, 099, 511, 627, 776 pebi Pi 250 = 1, 125, 899, 906, 842, 624 exbi Ei 260 = 1, 152, 921, 504, 606, 846, 976 zebi Zi 270 = 1, 180, 591, 620, 717, 411, 303, 424 yobi Yi 280 = 1, 208, 925, 819, 614, 629, 174, 706, 176 As of this writing, this proposal has yet to gain widespread use… • International Electrotechnical Commission (IEC) in 1999 introduced these to specify binary quantities. • Names come from shortened versions of the original SI prefixes (same pronunciation) and bi is short for “binary”, but pronounced “bee” : -( • Now SI prefixes only have their base-10 meaning and never have a base-2 meaning.

The way to remember #s • What is 234? How many bits addresses (I. e. , what’s ceil log 2 = lg of) 2. 5 Ti. B? • Answer! 2 XY means… X=0 --X=1 kibi ~103 X=2 mebi ~106 X=3 gibi ~109 X=4 tebi ~1012 X=5 pebi ~1015 X=6 exbi ~1018 X=7 zebi ~1021 X=8 yobi ~1024 Y=0 1 Y=1 2 Y=2 4 Y=3 8 Y=4 16 Y=5 32 Y=6 64 Y=7 128 Y=8 256 Y=9 512 MEMORIZE!

Which base do we use? • Decimal: great for humans, especially when doing arithmetic • Hex: if human looking at long strings of binary numbers, its much easier to convert to hex and look 4 bits/symbol • Terrible for arithmetic on paper • Binary: what computers use; you will learn how computers do +, -, *, / • To a computer, numbers always binary • Regardless of how number is written: • 32 ten == 3210 == 0 x 20 == 1000002 == 0 b 100000 • Use subscripts “ten”, “hex”, “two” in book, slides when might be confusing

Two’s Complement for N=32 0000. . . 0111. . . 1111 1000. . . 0000. . . 1111 0000 two = 0000 0001 two = 0000 0010 two = 1111 1111 0000 0000 0 ten 1 ten 2 ten 1101 two = 1110 two = 1111 two = 0000 two = 0001 two = 0010 two = 2, 147, 483, 645 ten 2, 147, 483, 646 ten 2, 147, 483, 647 ten – 2, 147, 483, 648 ten – 2, 147, 483, 647 ten – 2, 147, 483, 646 ten 1111 1101 two = 1111 1110 two = 1111 two = – 3 ten – 2 ten – 1 ten • One zero; 1 st bit called sign bit • 1 “extra” negative: no positive 2, 147, 483, 648 ten

Two’s comp. shortcut: Sign extension • Convert 2’s complement number rep. using n bits to more than n bits • Simply replicate the most significant bit (sign bit) of smaller to fill new bits • 2’s comp. positive number has infinite 0 s • 2’s comp. negative number has infinite 1 s • Binary representation hides leading bits; sign extension restores some of them • 16 -bit -4 ten to 32 -bit: 1111 1100 two 1111 1111 1100 two

Preview: Signed vs. Unsigned Variables • Java and C declare integers int • Use two’s complement (signed integer) • Also, C declaration unsigned int • Declares a unsigned integer • Treats 32 -bit number as unsigned integer, so most significant bit is part of the number, not a sign bit