Bits and Bytes Topics n n Why bits

Bits and Bytes Topics n n Why bits? Representing information as bits l Binary/Hexadecimal l Byte representations » numbers » characters and strings » Instructions n Bit-level manipulations l Boolean algebra l Expressing in C

Why Don’t Computers Use Base 10? Base 10 Number Representation n That’s why fingers are known as “digits” n Natural representation for financial transactions l Floating point number cannot exactly represent $1. 20 n Even carries through in scientific notation l 1. 5213 X 104 Implementing Electronically n Hard to store l ENIAC (First electronic computer) used 10 vacuum tubes / digit n Hard to transmit l Need high precision to encode 10 signal levels on single wire n Messy to implement digital logic functions l Addition, multiplication, etc. – 2–

Binary Representations Base 2 Number Representation n Represent 1521310 as 111011012 n Represent 1. 2010 as 1. 00110011[0011]… 2 Represent 1. 5213 X 104 as 1. 11011012 X 213 n Electronic Implementation n Easy to store with bistable elements Straightforward implementation of arithmetic functions Reliably transmitted on noisy and inaccurate wires 0 3. 3 V 2. 8 V 0. 5 V 0. 0 V – 3– 1 0

Byte-Oriented Memory Organization Programs Refer to Virtual Addresses n Conceptually very large array of bytes n Actually implemented with hierarchy of different memory types l SRAM, DRAM, disk l Only allocate for regions actually used by program n In Unix and Windows NT, address space private to particular “process” l Program being executed l Program can clobber its own data, but not that of others Compiler + Run-Time System Control Allocation n – 4– Where different program objects should be stored Multiple mechanisms: static, stack, and heap In any case, allocation within single virtual address space

Encoding Byte Values Byte = 8 bits n Binary n Decimal: Hexadecimal n 00002 010 0016 to 11112 to to 25510 FF 16 l Base 16 number representation l Use characters ‘ 0’ to ‘ 9’ and ‘A’ to ‘F’ l Write FA 1 D 37 B 16 in C as 0 x. FA 1 D 37 B » Or 0 xfa 1 d 37 b – 5– al y im ar x c n He De Bi 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 A 10 1010 B 11 1011 C 12 1100 D 13 1101 E 14 1110 F 15 1111

Machine Words Machine Has “Word Size” n Nominal size of integer-valued data l Including addresses n Most current machines are 32 bits (4 bytes) l Limits addresses to 4 GB l Becoming too small for memory-intensive applications n High-end systems are 64 bits (8 bytes) l Potentially address 1. 8 X 1019 bytes n Machines support multiple data formats l Fractions or multiples of word size l Always integral number of bytes – 6–

Word-Oriented Memory Organization 32 -bit 64 -bit Words Addresses Specify Byte Locations n n Address of first byte in word Addresses of successive words differ by 4 (32 -bit) or 8 (64 -bit) Addr = 0000 ? ? Addr = 0004 ? ? Addr = 0008 ? ? Addr = 0012 ? ? – 7– Addr = 0000 ? ? Addr = 0008 ? ? Bytes Addr. 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015

Data Representations Sizes of C Objects (in Bytes) n C Data Type Compaq Alpha l int l long int l char l short l float l double l long double l char * » Or any other pointer – 8– 4 8 1 2 4 8 8 8 Typical 32 -bit Intel IA 32 4 4 1 2 4 8 8 4 4 4 1 2 4 8 10/12 4

Byte Ordering How should bytes within multi-byte word be ordered in memory? Conventions n Sun’s, Mac’s are “Big Endian” machines l Least significant byte has highest address n Alphas, PC’s are “Little Endian” machines l Least significant byte has lowest address – 9–

Byte Ordering Example Big Endian n Least significant byte has highest address Little Endian n Least significant byte has lowest address Example n n Variable x has 4 -byte representation 0 x 01234567 Address given by &x is 0 x 100 Big Endian 0 x 100 0 x 101 0 x 102 0 x 103 01 Little Endian 45 67 0 x 100 0 x 101 0 x 102 0 x 103 67 – 10 – 23 45 23 01

Reading Byte-Reversed Listings Disassembly n Text representation of binary machine code n Generated by program that reads the machine code Example Fragment Address 8048365: 8048366: 804836 c: Instruction Code 5 b 81 c 3 ab 12 00 00 83 bb 28 00 00 Assembly Rendition pop %ebx add $0 x 12 ab, %ebx cmpl $0 x 0, 0 x 28(%ebx) Deciphering Numbers n n – 11 – Value: Pad to 4 bytes: Split into bytes: Reverse: 0 x 12 ab 0 x 000012 ab 00 00 12 ab ab 12 00 00

Examining Data Representations Code to Print Byte Representation of Data n Casting pointer to unsigned char * creates byte array typedef unsigned char *pointer; void show_bytes(pointer start, int len) { int i; for (i = 0; i < len; i++) printf("0 x%pt 0 x%. 2 xn", start+i, start[i]); printf("n"); } Printf directives: %p: Print pointer %x: Print Hexadecimal – 12 –

show_bytes Execution Example int a = 15213; printf("int a = 15213; n"); show_bytes((pointer) &a, sizeof(int)); Result (Linux): int a = 15213; – 13 – 0 x 11 ffffcb 8 0 x 6 d 0 x 11 ffffcb 9 0 x 3 b 0 x 11 ffffcba 0 x 00 0 x 11 ffffcbb 0 x 00

Representing Integers int A = 15213; int B = -15213; long int C = 15213; Linux/Alpha A 6 D 3 B 00 00 Linux/Alpha B 93 C 4 FF FF – 14 – Decimal: 15213 Binary: 0011 1011 0110 1101 Hex: 3 B 6 D Sun A Linux C Alpha C Sun C 00 00 3 B 6 D 6 D 3 B 00 00 00 3 B 6 D Sun B FF FF C 4 93 Two’s complement representation (Covered next lecture)

Alpha P Representing Pointers int B = -15213; int *P = &B; Alpha Address Hex: 1 Binary: Sun P EF FF FB 2 C F F F C A 0 0001 1111 1111 1100 1010 0000 A 0 FC FF FF 01 00 00 00 Sun Address Hex: Binary: E F F B 2 C 1110 1111 1011 0010 1100 Linux P Linux Address Hex: Binary: B F F 8 D 4 1011 1111 1000 1101 0100 Different compilers & machines assign different locations to objects – 15 – D 4 F 8 FF BF

Representing Floats Float F = 15213. 0; Linux/Alpha F 00 B 4 6 D 46 Sun F 46 6 D B 4 00 IEEE Single Precision Floating Point Representation Hex: Binary: 15213: 4 6 6 D B 4 0 0 0100 0110 1101 1011 0100 0000 1110 1101 1011 01 Not same as integer representation, but consistent across machines Can see some relation to integer representation, but not obvious – 16 –

Representing Strings in C char S[6] = "15213"; n Represented by array of characters n Each character encoded in ASCII format l Standard 7 -bit encoding of character set l Other encodings exist, but uncommon l Character “ 0” has code 0 x 30 » Digit i has code 0 x 30+i n String should be null-terminated l Final character = 0 Linux/Alpha S Sun S 31 35 32 31 33 00 Compatibility n Byte ordering not an issue l Data are single byte quantities n Text files generally platform independent l Except for different conventions of line termination character(s)! – 17 –

Machine-Level Code Representation Encode Program as Sequence of Instructions n Each simple operation l Arithmetic operation l Read or write memory l Conditional branch n Instructions encoded as bytes l Alpha’s, Sun’s, Mac’s use 4 byte instructions » Reduced Instruction Set Computer (RISC) l PC’s use variable length instructions » Complex Instruction Set Computer (CISC) n Different instruction types and encodings for different machines l Most code not binary compatible Programs are Byte Sequences Too! – 18 –

Representing Instructions int sum(int x, int y) { return x+y; } n For this example, Alpha & Sun use two 4 -byte instructions l Use differing numbers of instructions in other cases n PC uses 7 instructions with lengths 1, 2, and 3 bytes l Same for NT and for Linux l NT / Linux not fully binary compatible Alpha sum 00 00 30 42 01 80 FA 6 B Sun sum PC sum 81 C 3 E 0 08 90 02 00 09 55 89 E 5 8 B 45 0 C 03 45 08 89 EC 5 D C 3 Different machines use totally different instructions and encodings – 19 –

Boolean Algebra Developed by George Boole in 19 th Century n Algebraic representation of logic l Encode “True” as 1 and “False” as 0 And n Or A&B = 1 when both A=1 and B=1 Not n – 20 – ~A = 1 when A=0 n A|B = 1 when either A=1 or B=1 Exclusive-Or (Xor) n A^B = 1 when either A=1 or B=1, but not both

Application of Boolean Algebra Applied to Digital Systems by Claude Shannon n 1937 MIT Master’s Thesis n Reason about networks of relay switches l Encode closed switch as 1, open switch as 0 A&~B A Connection when ~B A&~B | ~A&B ~A&B – 21 – = A^B

Integer Algebra Integer Arithmetic n Z, +, *, –, 0, 1 forms a “ring” n Addition is “sum” operation Multiplication is “product” operation – is additive inverse 0 is identity for sum 1 is identity for product n n – 22 –

Boolean Algebra n {0, 1}, |, &, ~, 0, 1 forms a “Boolean algebra” n Or is “sum” operation And is “product” operation n n – 23 – ~ is “complement” operation (not additive inverse) 0 is identity for sum 1 is identity for product

Boolean Algebra Integer Ring n n n Commutativity A|B = B|A A&B = B&A Associativity (A | B) | C = A | (B | C) (A & B) & C = A & (B & C) Product distributes over sum A & (B | C) = (A & B) | (A & C) Sum and product identities A|0 = A A&1 = A Zero is product annihilator A&0 = 0 Cancellation of negation ~ (~ A) = A – 24 – A+B = B+A A*B = B*A (A + B) + C = A + (B + C) (A * B) * C = A * (B * C) A * (B + C) = A * B + B * C A+0 = A A*1 =A A*0 = 0 – (– A) = A

Boolean Algebra Integer Ring n n Boolean: Sum distributes over product A | (B & C) = (A | B) & (A | C) A + (B * C) (A + B) * (B + C) Boolean: Idempotency A|A = A A +A A l “A is true” or “A is true” = “A is true” n A&A = A Boolean: Absorption A | (A & B) = A A *A A A + (A * B) A l “A is true” or “A is true and B is true” = “A is true” n A & (A | B) = A Boolean: Laws of Complements A | ~A = 1 A * (A + B) A A + –A 1 l “A is true” or “A is false” n Ring: Every element has additive inverse A | ~A 0 A + –A = 0 – 25 –

$Boolean Ring Properties of & and ^ n {0, 1}, ^, &, , 0,$

Boolean Ring Properties of & and ^ n {0, 1}, ^, &, , 0, 1 n Identical to integers mod 2 is identity operation: (A) = A n AÂ=0 Property n n n n n – 26 – Commutative sum Commutative product Associative sum Associative product Prod. over sum 0 is sum identity 1 is prod. identity 0 is product annihilator Additive inverse Boolean Ring A^B = BÂ A&B = B&A (A ^ B) ^ C = A ^ (B ^ C) (A & B) & C = A & (B & C) A & (B ^ C) = (A & B) ^ (B & C) A^0 = A A&1 = A A&0=0 AÂ = 0

Relations Between Operations De. Morgan’s Laws n Express & in terms of |, and vice-versa l A & B = ~(~A | ~B) » A and B are true if and only if neither A nor B is false l A | B = ~(~A & ~B) » A or B are true if and only if A and B are not both false Exclusive-Or using Inclusive Or l A ^ B = (~A & B) | (A & ~B) » Exactly one of A and B is true l A ^ B = (A | B) & ~(A & B) » Either A is true, or B is true, but not both – 27 –

General Boolean Algebras Operate on Bit Vectors n Operations applied bitwise 01101001 & 0101 01000001 01101001 | 0101 01111101 01101001 ^ 0101 00111100 ~ 0101 10101010 All of the Properties of Boolean Algebra Apply – 28 –

Representing & Manipulating Sets Representation n n Width w bit vector represents subsets of {0, …, w– 1} aj = 1 if j A 01101001 { 0, 3, 5, 6 } 76543210 0101 76543210 { 0, 2, 4, 6 } Operations n n – 29 – & | ^ ~ Intersection Union Symmetric difference Complement 01000001 01111101 00111100 1010 { 0, 6 } { 0, 2, 3, 4, 5, 6 } { 2, 3, 4, 5 } { 1, 3, 5, 7 }

Bit-Level Operations in C Operations &, |, ~, ^ Available in C n Apply to any “integral” data type l long, int, short, char n n View arguments as bit vectors Arguments applied bit-wise Examples (Char data type) n ~0 x 41 --> ~010000012 0 x. BE n ~0 x 00 --> 101111102 ~00002 0 x. FF n 0 x 69 & 0 x 55 --> 11112 --> 0 x 41 011010012 & 01012 --> 010000012 n 0 x 69 | 0 x 55 --> 0 x 7 D 011010012 | 01012 --> 011111012 – 30 –

Contrast: Logic Operations in C Contrast to Logical Operators n &&, ||, ! l View 0 as “False” l Anything nonzero as “True” l Always return 0 or 1 l Early termination Examples (char data type) n n n – 31 – !0 x 41 --> !0 x 00 --> !!0 x 41 --> 0 x 00 0 x 01 0 x 69 && 0 x 55 --> 0 x 01 0 x 69 || 0 x 55 --> 0 x 01 p && *p (avoids null pointer access)

Shift Operations Left Shift: n x << y Shift bit-vector x left y positions l Throw away extra bits on left l Fill with 0’s on right Right Shift: n x >> y Shift bit-vector x right y positions l Throw away extra bits on right n Logical shift l Fill with 0’s on left n Arithmetic shift l Replicate most significant bit on – 32 – right l Useful with two’s complement integer representation Argument x 01100010 << 3 00010000 Log. >> 2 00011000 Arith. >> 2 00011000 Argument x 10100010 << 3 00010000 Log. >> 2 00101000 Arith. >> 2 11101000

Cool Stuff with Xor n Bitwise Xor is form of addition n With extra property that every value is its own additive inverse void funny(int *x, int *y) { *x = *x ^ *y; /* #1 */ *y = *x ^ *y; /* #2 */ *x = *x ^ *y; /* #3 */ } A^A=0 – 33 – *x *y Begin A B 1 A^B B 2 A^B (A^B)^B = A 3 (A^B)^A = B A End B A

Main Points It’s All About Bits & Bytes n Numbers n Programs Text n Different Machines Follow Different Conventions n n n Word size Byte ordering Representations Boolean Algebra is Mathematical Basis n n Basic form encodes “false” as 0, “true” as 1 General form like bit-level operations in C l Good for representing & manipulating sets – 34 –