Data Representation Art 311 Dr J R Parker

  • Slides: 58
Download presentation
Data Representation Art 311 Dr. J R. Parker

Data Representation Art 311 Dr. J R. Parker

Key Concept #4 - Archive Which is to say, memory. Humans have a memory

Key Concept #4 - Archive Which is to say, memory. Humans have a memory of their own experiences, but have come to rule the planet by extending that (via teaching) to memories of past and other people’s experiences. How is this stored, accessed?

Key Concept #4 - Archive We now have unparalleled access to public information. We

Key Concept #4 - Archive We now have unparalleled access to public information. We are now personally responsible for our own information and our family’s. We have immediate access to most human knowledge. We can not only read but publish, both truth and lies. What is real?

Key Concept #4 - Archive in new media is partly what is being manipulated

Key Concept #4 - Archive in new media is partly what is being manipulated by the interfaces. Flickr, Youtube manipulate images Blogs, e. Books, web sites but also museums, libraries.

Data Representation The basic question today is: “Given that a computer only manipulates numbers,

Data Representation The basic question today is: “Given that a computer only manipulates numbers, how can we represent interesting things like images, sounds, graphics, text, video, and so on”? The answer differs depending on the type of data.

Data Representation This subject is basic to creating new things on a computer. We

Data Representation This subject is basic to creating new things on a computer. We need to become familiar with the standard methods of representing data. We need to acquire the skill of inventing new representation for things that we ourselves invent. So, how would you represent music (IE notes)? More later…

Text So, we’ve already seen how text is represented, at least briefly. Remember ASCII?

Text So, we’ve already seen how text is represented, at least briefly. Remember ASCII? There are 96 printable characters. 128 characters altogether.

Text What do we need to consider when building a text representation? - Upper

Text What do we need to consider when building a text representation? - Upper case/lower case - The space character has to come before others to make sorting easy. - non-alphanumeric characters were positioned to correspond to their shifted position on typewriters - The first two columns (32 positions) were reserved for control characters. - The digits 0– 9 were placed so they correspond to values in binary prefixed with 011, making conversion with binary-coded decimal straightforward.

Text This is a typical ASCII table. You do not need to know it.

Text This is a typical ASCII table. You do not need to know it. An irritating detail is that characters are defined in a base-16 number system called HEXADECIMAL or just HEX. Why? Allow me to explain.

Hexadecimal My feeling, after being involved with computers since 1971, is that computer guys

Hexadecimal My feeling, after being involved with computers since 1971, is that computer guys are lazy. This is not a bad thing, but motivates much of what they do. We are so lazy that we will spend days writing a program to do simple things that we’re bored with doing. Much of the history of computing can be explained by the need to avoid tedious repetitive work using a computer.

Hexadecimal So, HEX: All numbers in a computer are binary, or base 2 0001

Hexadecimal So, HEX: All numbers in a computer are binary, or base 2 0001 is one 0010 is two 0100 is four 1000 is eight And so on. Powers of 2, like decimal numbers use powers of ten Hexadecimal numbers use base 16. Why is this convenient? I’m getting there.

Hexadecimal Base 16 is a problem, as we would nee 16 distinct characters as

Hexadecimal Base 16 is a problem, as we would nee 16 distinct characters as digits. We use letters A, B, C, D, E, F in conjunction with our regular digits. So 1 is still one … and 9 is still nine. But A is ten B is eleven C is twelve D is thirteen E is fourteen And F is fifteen Why is this convenient? I’m getting there.

Hexadecimal Positional number systems use powers of the base. 160 is 1 161 is

Hexadecimal Positional number systems use powers of the base. 160 is 1 161 is 16 162 is 256 163 is 4096 … Why is this convenient? I’m getting there.

Hexadecimal Counting in base 16: 0, 1, 2, 3, 4, 5, 6, 7, 8,

Hexadecimal Counting in base 16: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1 A, 1 B, `1 C, 1 D, 1 E, 1 F, 20, … so 2016 = 3210 Why is this convenient? I’m getting there.

Hexadecimal Reminder: why are we doing this? So we can read computer science tables

Hexadecimal Reminder: why are we doing this? So we can read computer science tables and documents. Like the ASCII table. (I’m teaching you to read!) Converting: 1216 is 1 x 16 + 2 = 1810 2 A 116 is 2 x 256 + 10 x 16 + 1 = 67310 Why is this convenient? I’m getting there.

Hexadecimal Now, 16 is an exact power of 2 (it is 24) Each hex

Hexadecimal Now, 16 is an exact power of 2 (it is 24) Each hex digit takes exactly 4 binary digits (BITS) to represent in binary. So converting from hex to binary and back is trivially simple. Converting hex to binary: replace each hex digit with the binary equivalent 2 A 1 2 A 116 = 0010 1010 0001 = 0010101000012 (= 67310) Why is this convenient? I’m getting there.

Hexadecimal Converting binary to hex: group binary number into sets of 4 digits (bits)

Hexadecimal Converting binary to hex: group binary number into sets of 4 digits (bits) and convert those into hex. So 01101010010101 becomes 011 0101 0010 1001 0101 (group from the right) 3 A 2 9 5 3 A 9516 = 011010100101012 That’s why this is convenient.

Hexadecimal Yup, that’s it. Easy conversion between Hex and Binary, and hex uses many

Hexadecimal Yup, that’s it. Easy conversion between Hex and Binary, and hex uses many fewer digits. We can list binary numbers in a lot less space. That’s why this is convenient. Let’s move on …

Text So characters are binary numbers when stored in memory, and they are often

Text So characters are binary numbers when stored in memory, and they are often coded using ASCII. A string is a sequence of characters. In a file we can indicate them using quotes: “This is a string” In memory they are placed in consecutive locations. Two ways to do this: 1. Start with an indication of how many characters there are. 2. Terminate the string with a special character

Text The first string begins With a count. The second ends with a character

Text The first string begins With a count. The second ends with a character whose code is 0 – this is a nul character, and the string is referred to as a nul terminated string. T h i s a s t r I n g 16 52 104 105 115 32 97 32 115 116 114 105 110 103 0

Dates Of the text strings, dates are the hardest to deal with There are

Dates Of the text strings, dates are the hardest to deal with There are many, many ways to display them, and many things we want to do with them. ISO 1987 -10 -12 IBM USA 10/12/1987 IBM Europe 12. 10. 1987 Unf Julian 1987285 Julian 87/285 MDY 10/12/87 YMD 87/12/10 DMY 12/10/87 October 12, 1987 12 Oct 87 Oct 12, 1987 Etc etc

Dates Questions – why do we use date information? Shouldn’t the representation make answering

Dates Questions – why do we use date information? Shouldn’t the representation make answering the common questions simple? Has X passed? Print X in a particular way How many days since X? How long until X? Input X from the console

Dates Has X passed? Print X in a particular way Store year as 4

Dates Has X passed? Print X in a particular way Store year as 4 digits How many days since X? (avoids the Y 2 K problem) How long until X? Input X from the console Do not store month as string. Hard to use that way – store as a number. Store day as number. EG 2012 01 12

Dates BUT: each month has a different number of days. Has X passed? Print

Dates BUT: each month has a different number of days. Has X passed? Print X in a particular way This makes differences How many days since X? How long until X? hard to calculate. Input X from the console Days between Mar 10 and May 12?

Dates Days between Mar 10 and May 12 = 63 (not counting last day)

Dates Days between Mar 10 and May 12 = 63 (not counting last day) Has X passed? Print X in a particular way How many days since X? How long until X? Input X from the console

Dates The international standard ISO 8601 describes a string representation for Has X passed?

Dates The international standard ISO 8601 describes a string representation for Has X passed? dates and times. Two Print X in a particular way simple examples of this How many days since X? How long until X? format are Input X from the console 2007 -03 -04 20: 32: 17 20070304 T 203217

Dates both stand for the 4 th of March 2007, a bit after half

Dates both stand for the 4 th of March 2007, a bit after half past eight in the evening (forgot about time) 2007 -03 -04 20: 32: 17 20070304 T 203217 Has X passed? Print X in a particular way How many days since X? How long until X? Input X from the console

Dates Unix time: The number of seconds elapsed since the beginning of the year

Dates Unix time: The number of seconds elapsed since the beginning of the year 1970. 1172960204. 226908 Has X passed? Print X in a particular way How many days since X? How long until X?

Dates This discussion was started just to show some simple things can become complicated.

Dates This discussion was started just to show some simple things can become complicated. We use dates all of the time, but the millennia has made them complex rather Has X passed? than simple. Print X in a particular way

Text Printing – give the address of a string to the printer. It converts

Text Printing – give the address of a string to the printer. It converts the numbers (characters) into electronic signals which print the characters (or draw character images onto a page) Characters each have an image that represents them. It’s called a glyph.

Glyphs A glyph is a simple graphic. The letter ‘B’ is drawn as: The

Glyphs A glyph is a simple graphic. The letter ‘B’ is drawn as: The paper is white, and the drawn glyph consists of black ’spots’ drawn by the printer on a 2 D mesh or grid. This is a simple image – more on images later.

Glyphs The point is that, for any particular size (indicated by how many dots

Glyphs The point is that, for any particular size (indicated by how many dots are on each side of the glyph image) a character glyph contains a certain percentage of black. That can be thought of as how black the glyph is. This allows us to create images with characters.

Glyphs For the ‘B’ on the right, there are 11 rows and 10 columns

Glyphs For the ‘B’ on the right, there are 11 rows and 10 columns = 110 squares. Of those, 8+5+4+4+3+6+4+4+4+5+8= 55 55/110 = 50% So any spot on an image to be created that is 50% black can be drawn as a ‘B’

Glyphs Darker /| | | |/ Lighter . '`, ^: "; ~ -_+<>i!l. I?

Glyphs Darker /| | | |/ Lighter . '`, ^: "; ~ -_+<>i!l. I? /|()1{}[] rcvunxzjft LCJUYXZO 0 Q oahkbdpqwm *WMB 8&%$#@ This is for white characters on a black background. Reverese For printing on paper.

ASCII Images +WWWMMWWX; VBVIVVXRRRMMMWWWWWWWMMMBRRBRRRVi MWWWWBRMBYXVVXI+; ; +IIXBWWWWWWWMMBBBRBBBBMI XWWWWMVRXVt; t+=IXBRRYi=i. VMWWWWWWMXVYIYVYVBBBBBBRBMBI , MWMWBYXXRBR=. =t. YVBMMWMV=+RMWWWWBXVIVRRRRVi.

ASCII Images +WWWMMWWX; VBVIVVXRRRMMMWWWWWWWMMMBRRBRRRVi MWWWWBRMBYXVVXI+; ; +IIXBWWWWWWWMMBBBRBBBBMI XWWWWMVRXVt; t+=IXBRRYi=i. VMWWWWWWMXVYIYVYVBBBBBBRBMBI , MWMWBYXXRBR=. =t. YVBMMWMV=+RMWWWWBXVIVRRRRVi. IBBRBXYMWWV MWRRIt. RBMMMM: : +, +tt. IVVMM; i. BMWWMRRMMWWMBRRRYi. VVt. VVRBMY MWXt. X=t. MMMMMIt, . : =t. YBMRBBIBWWMMMVIIt. XBMXVVI+t. Ii. I: YRXMB WMBIi. R+Yt. BBMXRBMMMMMMMBRYRMWMMMMWWXti, . ; t. IIYYi. RRBBMMV , MWMMt. IRR=, +XBBMMMMWR: ; RWWBRMMWWWMBRXYXBBXYYV+VMWMBMMM VWMMi: : BW, IIRMMMMWWWWX+. . , I+: i. MMWWWWWWMMMBBRXVXYRMMWMBMMX +WMMY: : , BX : RMBMMMWWMMRYVVXMWBXVMWWWWWWWMMBRXXXXYIRMBMMBBMR =WMBBi: tt t. BBRBBMMMMRYtit. YYVXMWWWWWMWWMMBXXVYXVi: VMMMMMMWV RWRV+, ; , , XBRRRBBBt, . . : +t=+: . . +RWWWWWMBBXVVVXXIi. I: IVMMBMMMW IMRBRRBMB= YXRRRBBBB, . . +YBBBBV. . . =MWWMBBRRVVXXBYXRRBXBMMBMMMB t. MYit. IXMi i. YXXRXRBRt; . : itt=: =i. XMWWWMBBRRVXXRRVRMWWMMMMMMBBB : MR: . , : Ii YVVXRBBBMMRRRBMWMMMMBBRXXXYY+VMXMWMBMMBBVVRI : RY+ i. VRBBBBMMMWWWMMMMMMMBBBBRRXVt. i. XYRYXMBBBYIt. VVt =XVY, ; t. VRBBMMMWWWMMMMBMBBBBXVYt i. XBMWWMRVXYXMMY RX: , , , ; +t. IIVXRBBRXVVIIi=, . = i. YYt+; , ; i. YRMMXt

ASCII Images

ASCII Images

ASCII Images

ASCII Images

ASCII Line Images

ASCII Line Images

Pictures The pictures we have seen are rows and columns of ascii (characters). Computer

Pictures The pictures we have seen are rows and columns of ascii (characters). Computer images are always stored in that way, but are not ASCII. We have a 2 D grid of elements, let’s say boxes, each having A distinct colour or grey level. Like a TV image.

Pictures

Pictures

Pictures Column 1 2 3 4 5 6 7 8 9 10 Row 1

Pictures Column 1 2 3 4 5 6 7 8 9 10 Row 1 2 3 4 4, 5 5 6 Picture elements (Pixels) are identified by [row, column]

Pictures Picture elements are numbers that indicate a colour or a grey level. EG

Pictures Picture elements are numbers that indicate a colour or a grey level. EG let 0 be black and 1 be white: Letter ‘T’ 00000000000000000000111111111111111111100000000000000000001111100000000000000000001111100000000000000000001111100000000000000000000

Graphics (line drawings) Lines are drawn on a canvas or background of some kind.

Graphics (line drawings) Lines are drawn on a canvas or background of some kind. It has a size. Lines can be defines by specifying the end point, and these can be specified as pixels. So (10, 10) (20, 20) is a line (segment) between those two pixels. Entire objects can be drawn using these segments alone.

Graphics (line drawings)

Graphics (line drawings)

Graphics (line drawings)

Graphics (line drawings)

Sound Computer sound is a sequence of loudness measurements, recorded as electronic levels or

Sound Computer sound is a sequence of loudness measurements, recorded as electronic levels or voltages, converted into binary, and stored (in order) in a file.

Sound Data is read by bouncing a low-powered laser beam off the reflective coating

Sound Data is read by bouncing a low-powered laser beam off the reflective coating in the disc. Light hitting a land (a flat area) is reflected back, and picked up by a photosensitive detector. Light hitting a pit is reflected back with far less intensity. 1’s and 0’s.

Video is a sequence of pictures, sampled at a known rate. TV is nearly

Video is a sequence of pictures, sampled at a known rate. TV is nearly 30 pictures (frames) per second. 35 MM film is 24 frames per second. We can use any rate we like.

What other things are there? ANYTHING that a computer manipulates is stored as numbers,

What other things are there? ANYTHING that a computer manipulates is stored as numbers, and the scheme used to convert to numbers from whatever is called a coding scheme. A codec is short for coder/decoder, and is software that implements The coding scheme.

Code? What do you mean, ‘code’? Video is a series if images that, when

Code? What do you mean, ‘code’? Video is a series if images that, when displayed rapidly one after the other, give the illusion of motion. Like a ‘flip book’ However, a TV image is 512 x 512 (just about) = 262 K. 1 second = 30 x 262 K = 7. 86 Mbyte 1 minute = 471 Mbyte 1 hour = 28 Gbyte We need to compress the images, and that’s where code/decode comes in.

Code? What do you mean, ‘code’?

Code? What do you mean, ‘code’?

Code? What do you mean, ‘code’? Each image can be compressed. JPEG compression can

Code? What do you mean, ‘code’? Each image can be compressed. JPEG compression can reduce size by a factor of 15 before artifacts can be seen clearly

Code? What do you mean, ‘code’? In a video, we can also compression between

Code? What do you mean, ‘code’? In a video, we can also compression between consecutive images.

Code? What do you mean, ‘code’? MPEG tries to predict motion based on previous

Code? What do you mean, ‘code’? MPEG tries to predict motion based on previous and post frames. An I-frame showing a triangle on white background! A following P-frame shows the same triangle but at another position.

Code? What do you mean, ‘code’? Reconstruction of inter coded frames goes ahead in

Code? What do you mean, ‘code’? Reconstruction of inter coded frames goes ahead in two steps: Application of the motion vector to the referred frame; Adding the prediction error compensation to the result;

Code? What do you mean, ‘code’? As an example the frame sequence above is

Code? What do you mean, ‘code’? As an example the frame sequence above is transfered in the following order: I P B B B. The only task of the decoder is to reorder the reconstructed frames. To support this an ascending frame number comes with each frame (modulo 1024).

Compression At this point we quit. It’s too much detail. I find compression dull,

Compression At this point we quit. It’s too much detail. I find compression dull, but some folks spend their lives working on these things. DVD video is also compressed, a kind of MPEG. AVI is a file format, within which various kinds of compression can be used. MOV (quicktime) is also a multimedia container file that can support many audio and video formats (AIFF, WAV, DV, MP 3, and MPEG-1.

What else? ? Questions on how to store any other kinds of data? ?

What else? ? Questions on how to store any other kinds of data? ?