Bits and Bytes Topics n Representing information as

Bits and Bytes Topics n Representing information as bits l Binary/Hexadecimal l Byte representations » numbers » characters and strings » Instructions n Bit-level manipulations l Boolean algebra l Expressing in C

Why Don’t Computers Use Base 10? Why Do We Use Base 10? n Count with 10 fingers n Natural representation for financial transactions Even used in scientific notation n l 1. 5213 X 104 Implementing Base 10 Electronically n Hard to store l ENIAC (First electronic computer) used 10 vacuum tubes / digit n Hard to transmit l Need high precision to encode 10 signal levels on single wire n Messy to implement digital logic functions l Addition, multiplication, etc. – 2–

Binary Representations Base 2 Number Representation n Represent 1521310 as 111011012 n Represent 1. 2010 as 1. 00110011[0011]… 2 Represent 1. 5213 X 104 as 1. 11011012 X 213 n Electronic Implementation n n Easy to store with bistable elements Reliably transmitted on noisy and inaccurate wires 0 3. 3 V 2. 8 V 0. 5 V 0. 0 V – 3– 1 0

Encoding Byte Values Byte = 8 bits n Binary n Decimal: Hexadecimal n 00002 010 0016 to 11112 to to 25510 FF 16 l Base 16 number representation l Use characters ‘ 0’ to ‘ 9’ and ‘A’ to ‘F’ l Write FA 1 D 37 B 16 in C as 0 x. FA 1 D 37 B » Or 0 xfa 1 d 37 b – 4– al y im ar x c n He De Bi 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 A 10 1010 B 11 1011 C 12 1100 D 13 1101 E 14 1110 F 15 1111

Machine Words Machine Has “Word Size” n Nominal size of integer-valued data n Most current machines are 32 bits (4 bytes) l Limits addresses to 4 GB l Becoming too small for memory-intensive applications n High-end systems are 64 bits (8 bytes) l Potentially address 1. 8 X 1019 bytes n Machines support multiple data formats l Integer and floating point l Multiples of word size l Always integral number of bytes – 5–

Data Representations Sizes of C Objects (in Bytes) n C Data Type Compaq Alpha Typical 32 -bit Intel IA 32 l int l l l l – 6– 4 4 4 long int 8 4 char 1 1 1 short 2 2 2 float 4 4 4 double 8 8 long double 8 8 char * 8 4 4 » Or any other pointer 4 8 10

Byte Ordering How should bytes within a word be ordered? Conventions n Sun’s, Mac’s are “Big Endian” machines l Least significant byte has highest address n Alphas, PC’s are “Little Endian” machines l Least significant byte has lowest address – 7–

Byte Ordering Example Big Endian n Least significant byte has highest address Little Endian n Least significant byte has lowest address Example n n Variable x has 4 -byte representation 0 x 01234567 Address given by &x is 0 x 100 Big Endian 0 x 100 0 x 101 0 x 102 0 x 103 01 Little Endian 45 67 0 x 100 0 x 101 0 x 102 0 x 103 67 – 8– 23 45 23 01

Reading Byte-Reversed Listings Disassembly n Text representation of binary machine code n Generated by program that reads the machine code Example Fragment Address 8048365: 8048366: 804836 c: Instruction Code 5 b 81 c 3 ab 12 00 00 83 bb 28 00 00 Assembly Rendition pop %ebx add $0 x 12 ab, %ebx cmpl $0 x 0, 0 x 28(%ebx) Deciphering Numbers n n – 9– Value: Pad to 4 bytes: Split into bytes: Reverse: 0 x 12 ab 0 x 000012 ab 00 00 12 ab ab 12 00 00

Representing Integers int A = 15213; int B = -15213; long int C = 15213; – 10 – Decimal: 15213 Binary: 0011 1011 0110 1101 Hex: 3 B 6 D A on PC A on Sun C on PC C on Alpha C on sun 6 D 3 B 00 00 3 B 6 D B on PC B on Sun 93 C 4 FF FF C 4 93 6 D 3 B 00 00 00 Two’s complement representation (Covered next lecture)

Representing Floats float F = 15213. 0; F on PC F on Sun 00 B 4 6 D 46 46 6 D B 4 00 IEEE Single Precision Floating Point Representation Hex: Binary: 15213: 4 6 6 D B 4 0 0 0100 0110 1101 1011 0100 0000 1110 1101 1011 01 Not same as integer representation, but consistent across machines thanks to IEEE standard for floating point numbers. – 11 –

Representing Strings in C char S[6] = "15213"; n Represented by array of characters n Each character encoded in ASCII format l Standard 7 -bit encoding of character set l Other encodings exist, but uncommon l Character “ 0” has code 0 x 30 » Digit i has code 0 x 30+i n String should be null-terminated l Final character = 0 S on PC S on Sun 31 35 32 31 33 00 Compatibility n Byte ordering not an issue l Data are single byte quantities n Text files generally platform independent l Except for different conventions of line termination character(s)! – 12 –

Machine Code A Program is a Sequence of Instructions n Each instruction is a simple operation l Arithmetic operation l Read or write memory l Conditional branch n Instructions encoded as bytes l Alpha’s, Sun’s, Mac’s use 4 byte instructions » Reduced Instruction Set Computer (RISC) l PC’s use variable length instructions » Complex Instruction Set Computer (CISC) n Different instruction types and encodings for different machines l Most code not binary compatible – 13 –

Instruction Representation int sum(int x, int y) { return x+y; } n For this example, Alpha & Sun use two 4 -byte instructions n PC uses 7 instructions with lengths 1, 2, and 3 bytes Alpha sum 00 00 30 42 01 80 FA 6 B Sun sum PC sum 81 C 3 E 0 08 90 02 00 09 55 89 E 5 8 B 45 0 C 03 45 08 89 EC 5 D C 3 Different machines use totally different instructions and encodings – 14 –

Boolean Algebra Developed by George Boole in 19 th Century n Algebraic representation of logic l Encode “True” as 1 and “False” as 0 And n Or A&B = 1 when both A=1 and B=1 Not n – 15 – ~A = 1 when A=0 n A|B = 1 when either A=1 or B=1 Exclusive-Or (Xor) n A^B = 1 when either A=1 or B=1, but not both

Relations Between Operations De. Morgan’s Laws n Express & in terms of |, and vice-versa l A & B = ~(~A | ~B) » A and B are true if and only if neither A nor B is false l A | B = ~(~A & ~B) » A or B are true if and only if A and B are not both false Exclusive-Or using Inclusive Or l A ^ B = (~A & B) | (A & ~B) » Exactly one of A and B is true l A ^ B = (A | B) & ~(A & B) » Either A is true, or B is true, but not both – 16 –

General Boolean Algebras Operate on Bit Vectors n Operations applied bitwise 01101001 & 0101 01000001 – 17 – 01101001 | 0101 01111101 01101001 ^ 0101 00111100 ~ 0101 10101010

Bit-Level Operations in C Operations &, |, ~, ^ Available in C n Apply to any “integral” data type l long, int, short, char n n View arguments as bit vectors Arguments applied bit-wise Examples (char data type) n ~0 x 41 --> ~010000012 0 x. BE n n ~0 x 00 --> 101111102 0 x. FF ~00002 --> 0 x 69 & 0 x 55 --> 11112 0 x 41 011010012 & 01012 --> 010000012 n 0 x 69 | 0 x 55 --> 0 x 7 D 011010012 | 01012 --> 011111012 – 18 –

Logic Operations in C Contrast to Logical Operators n &&, ||, ! l View 0 as “False” l Anything nonzero as “True” l Always return 0 or 1 l Short Circuiting Examples (char data type) n n n – 19 – !0 x 41 --> !0 x 00 --> !!0 x 41 --> 0 x 00 0 x 01 0 x 69 && 0 x 55 --> 0 x 01 0 x 69 || 0 x 55 --> 0 x 01 p && *p (avoids null pointer access)

Shift Operations Left Shift: n x << y Shift bit-vector x left y positions l Throw away extra bits on left l Fill with 0’s on right Right Shift: x >> y n Shift bit-vector x right y positions l Throw away extra bits on right n Logical shift l Fill with 0’s on left n Arithmetic shift l Replicate most significant bit on – 20 – right l Useful with two’s complement integer representation Argument x 01100010 << 3 00010000 Log. >> 2 00011000 Arith. >> 2 00011000 Argument x 10100010 << 3 00010000 Log. >> 2 00101000 Arith. >> 2 11101000

Cool Stuff with Xor n Bitwise Xor is form of addition n With extra property that every value is its own additive inverse void funny(int *x, int *y) { *x = *x ^ *y; /* #1 */ *y = *x ^ *y; /* #2 */ *x = *x ^ *y; /* #3 */ } A^A=0 – 21 – *x *y Begin A B 1 A^B B 2 A^B (A^B)^B = A 3 (A^B)^A = B A End B A

Main Points It’s All About Bits & Bytes n Code n Data Different Machines Follow Different Conventions n n n Word size Byte ordering Representations Boolean Algebra is Mathematical Basis n n Basic form encodes “false” as 0, “true” as 1 General form like bit-level operations in C l Good for representing & manipulating sets l Used by optimizing compilers for flow analysis – 22 –