Chapter 8 11 Representing Information Digitally TECH 1010

  • Slides: 34
Download presentation
Chapter 8 & 11: Representing Information Digitally TECH 1010 -02 Prof. Jeff Cheng

Chapter 8 & 11: Representing Information Digitally TECH 1010 -02 Prof. Jeff Cheng

Representing Information with Symbols • Digitize: Represent information with digits (normally base 10 numerals

Representing Information with Symbols • Digitize: Represent information with digits (normally base 10 numerals 0 through 9) • Limitation of Digits – Alternative Representation: Any set of symbols could represent phone number digits, as long as the keypad is labeled accordingly • Symbols, Briefly – Digits have the advantage of having short names (easy to say) 8 -2

8 -3

8 -3

Ordering Symbols • Advantage of digits for encoding info is that items can be

Ordering Symbols • Advantage of digits for encoding info is that items can be listed in numerical order • To use other symbols, we need an ordering system (collating sequence) – Agreed order from smallest to largest value • In choosing symbols for encoding, consider how symbols interact with things being encoded 8 -4

The Fundamental Representation of Information • The most fundamental form of information is the

The Fundamental Representation of Information • The most fundamental form of information is the presence or absence of a physical phenomenon • Pand. A is the mnemonic for "presence and absence" • It is discrete (distinct or separable)—the phenomenon is present or it is not (true or false; 1 or 0). There in no continuous gradation in between. 8 -5

Analog vs. Digital • Analog is continuous data/information – Always can find an intermediate

Analog vs. Digital • Analog is continuous data/information – Always can find an intermediate value between any two values (e. g. sound wave) • Digital is discrete data/information – Many distinct samples of data – All data in a computer is represented discretely in binary (0’s and 1’s) 8 -6

The sound wave represented by the sequence 0, 1. 5, 2. 0, 3. 0,

The sound wave represented by the sequence 0, 1. 5, 2. 0, 3. 0, 4. 0, 3. 0, 0 after sampling

A Binary System • Two patterns make a binary system – Base 2 (0

A Binary System • Two patterns make a binary system – Base 2 (0 or 1) • The basic binary unit is known as a "bit" (short for binary digit) • 8 bits are grouped together to form a byte, the basic unit used to represent storage capacity • We can give any names to these two patterns derived from one bit as long as we are consistent – Pand. A (Presence and Absence can represent 1 and 0, respectively) 8 -8

8 -9

8 -9

Combining Bit Patterns • Since each bit can only represent two patterns, hence by

Combining Bit Patterns • Since each bit can only represent two patterns, hence by combining more bits into sequences to create enough bit patterns to encode necessary information • Binary (Pand. A) has 2 patterns, arranging them into n-length sequences, we can create 2 n symbols 8 -10

8 -11

8 -11

Decoding Binary Number 100101

Decoding Binary Number 100101

Decimals => Binary To convert a decimal number d to a binary number is

Decimals => Binary To convert a decimal number d to a binary number is to find the binary digits. . such that These numbers can be found by successively dividing d by 2 until the quotient is 0. The remainders are For example, the decimal number 123 is 1111011 in binary. The conversion is conducted as follows:

Bits in Computer Memory • Memory is arranged inside a computer in a very

Bits in Computer Memory • Memory is arranged inside a computer in a very long sequence of bits (places where a physical phenomenon can be set and detected) • Analogy: Sidewalk Memory – Each sidewalk square represents a memory slot (bit), and stones represent the presence or absence – If a stone is on the square, the value is 1, if not the value is 0 • In real memory, information stored in each bit can be represented as a + or – electric static charge 8 -14

The 16 Hex Digits • 0, 1, 2, 3, 4, 5, 6, 7, 8,

The 16 Hex Digits • 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F – A = 10, B = 11, . . . , F = 15 • Sixteen values can be represented perfectly by 4 -bit sequences (24 = 16) • Changing hex digits to bits and back again: – Given a sequence of bits, group them in 4's and write the corresponding hex digit • 0101 1100 5 C – Given hex, write the associated group of 4 bits 8 -15

Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12

Decimal 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 -16 Hex (0 -9, A-F) Hex Binary 0 1 2 3 4 5 6 7 8 9 A B C D E F 0000 0001 0010 0011 0100 0101 0100 0111 1000 1001 1010 1011 1100 1101 1110 1111

Hex Explained • Recall in Chapter 4, we specified custom colors in HTML using

Hex Explained • Recall in Chapter 4, we specified custom colors in HTML using hex digits – e. g. , <font color ="#FF 8 E 2 A"> – Hex is short for hexadecimal, base 16 • Why use hex? Writing the sequence of bits is long, tedious, and error-prone 8 -17

Digitizing Information • Early binary representation— 1 and 0— encoded numbers and keyboard characters

Digitizing Information • Early binary representation— 1 and 0— encoded numbers and keyboard characters • Now representation for sound, video, and other types of information are also important

Digitizing Sound 8 -19

Digitizing Sound 8 -19

Representing Images • Bit map techniques – Pixel: short for “picture element” – Spectrum

Representing Images • Bit map techniques – Pixel: short for “picture element” – Spectrum of colors can be represented by mixing primary colors (Red, Green, Blue) in different proportion (e. g. purple can be represented as 100 parts of red, 0 part green and 100 parts of blue).

Digitizing Text • For encoding text, what symbols should be included? – We want

Digitizing Text • For encoding text, what symbols should be included? – We want to keep the list small enough to use fewer bits, but we don't want to leave out critical characters 8 -21

Assigning Symbols • 26 uppercase and 26 lowercase Roman letters, 10 Arabic numerals, 10

Assigning Symbols • 26 uppercase and 26 lowercase Roman letters, 10 Arabic numerals, 10 arithmetic characters, 20 punctuation characters (including space), and 3 nonprintable characters (new line, tab, backspace) = 95 characters, enough to represent English • For 95 symbols, we need 7 -bit sequences – 26 = 64 27 = 128 • A standard 7 -bit code is ASCII(American Standard Code for Information Interchange) 8 -22

ASCII Character Set

ASCII Character Set

Extended ASCII: An 8 -bit Code • By the mid-1960's, it became clear that

Extended ASCII: An 8 -bit Code • By the mid-1960's, it became clear that 7 -bit ASCII was not enough to represent text from languages other than English • IBM extended ASCII to 8 bits (256 symbols) • Called "Extended ASCII, " the first half is original ASCII with a 0 added at the beginning of each group of bits • Handles most Western languages and additional useful symbols 8 -24

Hexadecimal Extended ASCII Character Set 8 -25

Hexadecimal Extended ASCII Character Set 8 -25

ASCII Coding of Phone Numbers • How would a computer represent in its memory,

ASCII Coding of Phone Numbers • How would a computer represent in its memory, the phone number 888 555 1212? • Encode each digit with its ASCII byte 8 -26

Unicode • Several languages around the world have more than 256 individual characters •

Unicode • Several languages around the world have more than 256 individual characters • Unicode uses 16 bits; 216 = 65536 characters – 1 st 7 bits (128 chars) are ASCII chars – Different locales – different characters beyond 1 st 7 bits 8 -27

Beyond Text • Extended ASCII encodes letters and characters well, but most documents contain

Beyond Text • Extended ASCII encodes letters and characters well, but most documents contain more than just text. – Format information like font, font size, justification

Document Description Using Metadata • Metadata (data describing data) describing, but not limited to,

Document Description Using Metadata • Metadata (data describing data) describing, but not limited to, formatting information is represented using tags, as in HTML 8 -29

Structure Tags • Special set of tags was developed to specify structure for Oxford

Structure Tags • Special set of tags was developed to specify structure for Oxford English Dictionary – <hw> means headword, the word being defined – Other tags label pronunciation <pr>, phonetic notation <ph>, parts of speech <ps> • The tags do not print. They are there only to specify structure so the computer knows what part of the dictionary it is looking at 8 -30

8 -31

8 -31

Searching HTML Document Searching for HTML documents focused on Thai cuisine. How can we

Searching HTML Document Searching for HTML documents focused on Thai cuisine. How can we make the search more specific so that the search result does not just return any document that contains the phrase “Thai cuisine”?

Encoding Bits on a CD-ROM 8 -33

Encoding Bits on a CD-ROM 8 -33

Encoding Bits on a CD-ROM 8 -34

Encoding Bits on a CD-ROM 8 -34