Lecture 10 Representing Information Digitally Digitizing Discrete Information

  • Slides: 61
Download presentation
Lecture 10 Representing Information Digitally

Lecture 10 Representing Information Digitally

Digitizing Discrete Information • Digitize – Represent info with digits (symbols) – Digits: {

Digitizing Discrete Information • Digitize – Represent info with digits (symbols) – Digits: { 0, 1, 2, …, 9 } – Or digits: { A, B, C, …, Z } – Or any set of distinct symbols

Symbols, Briefly • Prefer short names for symbols – One, two, …, – Instead

Symbols, Briefly • Prefer short names for symbols – One, two, …, – Instead of “asterisk”, “closing parenthesis”, etc. • Aside: we shorten many names in IT – exclamation point => bang – asterisk => star – open parenthesis => open paren – open curly brace => open brace

Ordering Symbols • Want order for the digits/symbols – 0 – 9 has obvious

Ordering Symbols • Want order for the digits/symbols – 0 – 9 has obvious order – But what about { !, @, #, …, ) }? • Define a collating sequence • Digitize – Represent info with symbols

Fundamental Information Representation • Given digital info, how to store it? – Use physical

Fundamental Information Representation • Given digital info, how to store it? – Use physical phenomena • Light • Current • Magnetism

Fundamental Information Representation • In digital world – Don’t care how much, just presence

Fundamental Information Representation • In digital world – Don’t care how much, just presence • In logical world (basis of computing) – True and false

Fundamental Information Representation • Physical world can implement logical world – Presence => “true”

Fundamental Information Representation • Physical world can implement logical world – Presence => “true” – Absence => “false”

The Pand. A Representation • We will use “Pand. A” for presence and absence

The Pand. A Representation • We will use “Pand. A” for presence and absence representation – Only two states – Could use false for absent, true for present – Or 0 for absent, and 1 for present

The Pand. A Representation • Such a formulation is said to be discrete •

The Pand. A Representation • Such a formulation is said to be discrete • Discrete means “distinct” or “separable” – Opposite of continuous – No “shades of gray”

Analog vs. Digital • Analog is continuous data/information – Sound waves 8 -11

Analog vs. Digital • Analog is continuous data/information – Sound waves 8 -11

Analog vs. Digital • Digital is discrete info – Obtained by sampling 8 -12

Analog vs. Digital • Digital is discrete info – Obtained by sampling 8 -12

A Binary System • Pand. A encoding is binary

A Binary System • Pand. A encoding is binary

Bits Form Symbols • Pand. A unit is a binary digit (bit) • Bit

Bits Form Symbols • Pand. A unit is a binary digit (bit) • Bit sequences form binary numbers

Encoding Bits on a CD-ROM Pand. A bit values are pits and lands 8

Encoding Bits on a CD-ROM Pand. A bit values are pits and lands 8 -15

Bits in Computer Memory • Memory is a long sequence of bits • Sidewalk

Bits in Computer Memory • Memory is a long sequence of bits • Sidewalk Analogy

Sidewalk Memory • Imagine clean sidewalk consisting of squares – Presence of a stone

Sidewalk Memory • Imagine clean sidewalk consisting of squares – Presence of a stone on a square => 1 – Absence of a stone => 0 • Sidewalk: sequence of bits

Sidewalk Memory 0 1 0

Sidewalk Memory 0 1 0

Sidewalk Memory • Writing info – Put stone on square (1) – Remove stone

Sidewalk Memory • Writing info – Put stone on square (1) – Remove stone from square (0) • Reading info

Alternative Pand. A Encodings • Other ways to encode two states – Color of

Alternative Pand. A Encodings • Other ways to encode two states – Color of stone – Number of stones – Another?

Combining Bit Patterns • One bit with two states isn’t enough • So we

Combining Bit Patterns • One bit with two states isn’t enough • So we combine them

Hex Explained • Hex numbers are base-16 • A bit sequence may be 111110011000101010

Hex Explained • Hex numbers are base-16 • A bit sequence may be 111110011000101010 – Error prone – Instead use hex

The 16 Hex Digits • Hex digits – { 0, 1, 2, …, 9,

The 16 Hex Digits • Hex digits – { 0, 1, 2, …, 9, A, B, C, D, E, F } – Can represent 4 -bit sequences • • 0000 = 0 hex 0001 = 1 hex … 1001 = 9 hex 1010 = A hex … 1111 = F hex

Hex to Bits and Back Again • Each hex digit corresponds to 4 bits

Hex to Bits and Back Again • Each hex digit corresponds to 4 bits – 0010 1011 1010 1101 2 B A D – F A B 4 1111 1010 1011 0100 – 1 9 C 6 ?

Digitizing Numbers in Binary • Need binary representations for – Numbers – Characters •

Digitizing Numbers in Binary • Need binary representations for – Numbers – Characters • But also – image – video – sound

Counting in Binary • Binary numbers (base 2) uses digits 0 and 1 •

Counting in Binary • Binary numbers (base 2) uses digits 0 and 1 • Decimal numbers (base 10) use 0 through 9 Counting to ten

Counting in Binary • Place value representation

Counting in Binary • Place value representation

Place Value in a Decimal Number • Example, 1010 (base 10) is (1 ×

Place Value in a Decimal Number • Example, 1010 (base 10) is (1 × 1000) + (0 × 100) + (1 × 10) + (0 × 1)

Place Value in a Binary Number • Binary is base 2 so powers of

Place Value in a Binary Number • Binary is base 2 so powers of 2 are used

Place Value in a Binary Number • 1010 in binary – (1 × 8)

Place Value in a Binary Number • 1010 in binary – (1 × 8) + (0 × 4) + (1 × 2) + (0 × 1)

Digitizing Text • # of bits determines # of symbols that can be represented

Digitizing Text • # of bits determines # of symbols that can be represented – n bits => 2 n symbols

Digitizing Text • To digitize English text – Roman letters – Arabic numbers –

Digitizing Text • To digitize English text – Roman letters – Arabic numbers – Punctuation – Arithmetic symbols

Assigning Symbols • So we need to represent – 26 uppercase – 26 lowercase

Assigning Symbols • So we need to represent – 26 uppercase – 26 lowercase letters – 10 numerals – 20 punctuation characters – 10 arithmetic characters – 3 other characters (new line, tab, and backspace) – 95 symbols…enough for English

Assigning Symbols • To represent 95 distinct symbols we need how many bits? •

Assigning Symbols • To represent 95 distinct symbols we need how many bits? • Need to represent control characters too

Assigning Symbols • ASCII stands for American Standard Code for Information Interchange – Widely

Assigning Symbols • ASCII stands for American Standard Code for Information Interchange – Widely used 7 -bit code • Advantages of a “standard” – Interoperability of h/w – Communications among programs

Extended ASCII: An 8 -Bit Code • For other languages 7 bits aren’t enough

Extended ASCII: An 8 -Bit Code • For other languages 7 bits aren’t enough • IBM developed an 8 -bit ASCII – Uses 1 byte – Uses 0 in leftmost bit followed by 7 -bit ASCII codes – Allows 128 more codes that start with 1 – Can handle most Western languages

ASCII Character Set (Decimal) Decimal - Character 0 8 16 24 32 40 48

ASCII Character Set (Decimal) Decimal - Character 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 NUL 1 SOH BS 9 HT DLE 17 DC 1 CAN 25 EM SP 33 ! ( 41 ) 0 49 1 8 57 9 @ 65 A H 73 I P 81 Q X 89 Y ` 97 a h 105 i p 113 q x 121 y 2 10 18 26 34 42 50 58 66 74 82 90 98 106 114 122 STX NL DC 2 SUB " * 2 : B J R Z b j r z 3 11 19 27 35 43 51 59 67 75 83 91 99 107 115 123 ETX VT DC 3 ESC # + 3 ; C K S [ c k s { 4 12 20 28 36 44 52 60 68 76 84 92 100 108 116 124 EOT NP DC 4 FS $ , 4 < D L T d l t | 5 13 21 29 37 45 53 61 69 77 85 93 101 109 117 125 ENQ CR NAK GS % 5 = E M U ] e m u } 6 14 22 30 38 46 54 62 70 78 86 94 102 110 118 126 ACK SO SYN RS &. 6 > F N V ^ f n v ~ 7 15 23 31 39 47 55 63 71 79 87 95 103 111 119 127 BEL SI ETB US ' / 7 ? G O W _ g o w DEL

ASCII Character Set (Hexadecimal) Hexadecimal - Character 00 08 10 18 20 28 30

ASCII Character Set (Hexadecimal) Hexadecimal - Character 00 08 10 18 20 28 30 38 40 48 50 58 60 68 70 78 NUL BS DLE CAN SP ( 0 8 @ H P X ` h p x 01 09 11 19 21 29 31 39 41 49 51 59 61 69 71 79 SOH HT DC 1 EM ! ) 1 9 A I Q Y a i q y 02 0 A 12 1 A 22 2 A 32 3 A 42 4 A 52 5 A 62 6 A 72 7 A STX NL DC 2 SUB " * 2 : B J R Z b j r z 03 0 B 13 1 B 23 2 B 33 3 B 43 4 B 53 5 B 63 6 B 73 7 B ETX VT DC 3 ESC # + 3 ; C K S [ c k s { 04 0 C 14 1 C 24 2 C 34 3 C 44 4 C 54 5 C 64 6 C 74 7 C EOT NP DC 4 FS $ , 4 < D L T d l t | 05 0 D 15 1 D 25 2 D 35 3 D 45 4 D 55 5 D 65 6 D 75 7 D ENQ CR NAK GS % 5 = E M U ] e m u } 06 0 E 16 1 E 26 2 E 36 3 E 46 4 E 56 5 E 66 6 E 76 7 E ACK SO SYN RS &. 6 > F N V ^ f n v ~ 07 0 F 17 1 F 27 2 F 37 3 F 47 4 F 57 5 F 67 6 F 77 7 F BEL SI ETB US ' / 7 ? G O W _ g o w DEL

Beyond ASCII • Unicode – Uses up to 4 bytes to handle how many

Beyond ASCII • Unicode – Uses up to 4 bytes to handle how many characters? – Allows all modern scripts (Kanji, Arabic, Cyrillic, Hebrew, etc. ) – Contains 8 -bit ASCII as the low 256 characters for compatibility – Allows ancient scripts like Egyptian hieroglyphics

ASCII Coding of Phone Numbers • How to encode 888 555 1212 in ASCII?

ASCII Coding of Phone Numbers • How to encode 888 555 1212 in ASCII? • Encode each digit with its ASCII byte 8 8 8 5 5 etc. 00111000 00110101 etc.

Another ASCII Example • From Lab 1 CSCI ftw! Takes ? bytes to store.

Another ASCII Example • From Lab 1 CSCI ftw! Takes ? bytes to store. Representation in ASCII? 43 53 43 49 20 66 74 77 21 0 A In Binary? 0100 0011 0101 0011 0100 1001. . . 0010 0001 0000 1010

Advantages of Long Encodings • Short encodings save memory • Examples of longer encodings

Advantages of Long Encodings • Short encodings save memory • Examples of longer encodings – NATO Broadcast Alphabet – Bar Codes

NATO Broadcast Alphabet • NATO alphabet – Used for radio communication – Purposely inefficient

NATO Broadcast Alphabet • NATO alphabet – Used for radio communication – Purposely inefficient – Distinctive amid noise (‘m’ versus ‘n’) • Letters represented with word “symbols” – a => alpha, b => bravo, c => charlie • Digits keep their usual names – Except 9 => niner

NATO Broadcast Alphabet

NATO Broadcast Alphabet

Bar Codes • Universal Product Codes (UPC) use more bits than necessary • UPC-A

Bar Codes • Universal Product Codes (UPC) use more bits than necessary • UPC-A encoding uses 7 bits to encode the digits 0 – 9

Bar Codes • Encodes manufacturer (left side) and product (right side) – Different bit

Bar Codes • Encodes manufacturer (left side) and product (right side) – Different bit combinations are used for each side – One side is complement of the other – Bit patterns were chosen to appear as different as possible

Bar Codes • Encodings for each side make it possible to recognize whether code

Bar Codes • Encodings for each side make it possible to recognize whether code is upside down

Metadata and the OED • To represent info – Need to convert to binary

Metadata and the OED • To represent info – Need to convert to binary – Need to describe its properties • Characteristics of the content also need to be encoded – – – – How is the content structured? What other content is it related to? Where was it collected? When was it created or captured? What units is it given in? How should it be displayed? And so on…

Metadata and the OED • Metadata – info describing info – often specified with

Metadata and the OED • Metadata – info describing info – often specified with tags (like with HTML)

Properties of Data • ASCII encodes characters • Metadata gives properties of data –

Properties of Data • ASCII encodes characters • Metadata gives properties of data – font style – color – justification – margins – etc.

Properties of Data • Content and metadata example

Properties of Data • Content and metadata example

Using Tags for Metadata • Oxford English Dictionary (OED) – Definitive reference for every

Using Tags for Metadata • Oxford English Dictionary (OED) – Definitive reference for every English word’s meaning, etymology, and usage – Printed version is 20 volumes, weighs 150 pounds, and fills 4 feet of shelf space

Structure Tags • Digital OED uses tags to indicate structure – – – –

Structure Tags • Digital OED uses tags to indicate structure – – – – <hw> for a headword (word defined) <pr> for pronunciation <ph> for phonetic notations <ps> for part of speech <hm> for homonym numbers <e> for entire entry <hg> for head group (all info at start of definition)

Structure Tags • Algorithms utilize tags – Search – Formatting –…

Structure Tags • Algorithms utilize tags – Search – Formatting –…

8 -58

8 -58

Quiz • What’s the first step in debugging? – check for obvious – isolate

Quiz • What’s the first step in debugging? – check for obvious – isolate the problem – reproduce the problem – pinpoint • Fix the error in this CSS – body { color; red }

Quiz • Like all engineers, programmers begin with a _______ – a precise description

Quiz • Like all engineers, programmers begin with a _______ – a precise description of the input, how the system should behave, and how the output should be produced.

Summary • Digitizing info • Storing info using Pand. A – Bits, bytes, hex

Summary • Digitizing info • Storing info using Pand. A – Bits, bytes, hex • ASCII • Metadata