Introduction to Multimedia and MSEC 20 791 Mike

  • Slides: 35
Download presentation
Introduction to Multimedia and MSEC 20 -791 Mike Christel Alex Hauptmann ARCHIVE: http: //www.

Introduction to Multimedia and MSEC 20 -791 Mike Christel Alex Hauptmann ARCHIVE: http: //www. cs. cmu. edu/~christel/MM 2002/syllabus. htm

Contact Information Mike Christel christel@cs. cmu. edu http: //www. cs. cmu. edu/~christel (412) 268

Contact Information Mike Christel christel@cs. cmu. edu http: //www. cs. cmu. edu/~christel (412) 268 -7799 Wean Hall 5212 Alex Hauptmann alex@cs. cmu. edu http: //www. cs. cmu. edu/~alex (412) 268 -1448 Wean Hall 5124 Office Hours by Appointment © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 2 Carnegie Mellon

Teaching Assistant Rong Yan yanrong@cs. cmu. edu http: //www. cs. cmu. edu/~yanrong (412) 268

Teaching Assistant Rong Yan yanrong@cs. cmu. edu http: //www. cs. cmu. edu/~yanrong (412) 268 -9515 Newell Simon Hall 4533 Office Hours by Appointment © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 3 Carnegie Mellon

Carnegie Mellon Campus Map Wean Hall Newell. Simon Hall © Copyright 2002 Michael G.

Carnegie Mellon Campus Map Wean Hall Newell. Simon Hall © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 4 Carnegie Mellon

Course Outline Oct. 24 Oct. 29 Oct. 31 Nov. 5 Nov. 7 Nov. 12

Course Outline Oct. 24 Oct. 29 Oct. 31 Nov. 5 Nov. 7 Nov. 12 Nov. 14 Introduction to Multimedia Images as Multimedia Interface Components; Intro to Macromedia Flash 5 Digital Audio; Speech Recognition Image Processing and Computer Vision Speech Synthesis and Speech Dialogue Applications Digital Video Multimedia via Cell Phones and PDAs © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 5 Carnegie Mellon

Course Outline Nov. 19 Web Specifications, MM Synchronization Nov. 21 Digital Music and Music

Course Outline Nov. 19 Web Specifications, MM Synchronization Nov. 21 Digital Music and Music Processing Nov. 26 MM Projects: Project LISTEN, Informedia Dec. 3 Multimedia Information Retrieval, TREC Interactive Video Track Dec. 5 Multimedia and Entertainment: Carnegie Mellon’s Entertainment Technology Center Dec. 10 MM Content Analysis: Digital Human Memory; Informedia Interface Evaluation Dec. 12 (MM Experiences from the Field planned…) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 6 Carnegie Mellon

Grading • No midterm, no final • Textbook plus recommended links/readings • Grading based

Grading • No midterm, no final • Textbook plus recommended links/readings • Grading based on homeworks (90%), class presence and participation (10%) • Homeworks MUST be published to your web site; email me (christel@cs. cmu. edu) by next class your base URL from which a “MSEC 20 -791” link will exist • Homework time deadlines are strictly enforced: loss of 10% per day late for each assignment • Flash homework is worth twice other homeworks • 10% for class time meant to encourage you to show up mentally and physically for class © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 7 Carnegie Mellon

Definition of Multimedia Multi (Latin multus - numerous) Media, medium (Latin medius, medium: middle,

Definition of Multimedia Multi (Latin multus - numerous) Media, medium (Latin medius, medium: middle, center, intermediary; Latin mediat: intermediary, means) Multiple types of information captured, stored, manipulated, transmitted, and presented. Specifically: Images, Video, Audio (+Speech) and Text Related terms: hypermedia, hypertext Problem: “hypertext”, “hypermedia”, “multimedia” so overused/generalized they now convey little meaning © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 8 Carnegie Mellon

A Few Items in a Multimedia Timeline Pre-Digital Age: suggestions? see “Multimedia: From Wagner

A Few Items in a Multimedia Timeline Pre-Digital Age: suggestions? see “Multimedia: From Wagner to Virtual Reality”, http: //www. artmuseum. net/w 2 vr/timeline. html 1906 – Color photography made practicable http: //www. niepce. com/pagus-inv. html 1945 – Vannevar Bush, memex “As We May Think” http: //www. theatlantic. com/unbound/flashbks/computer/bushf. htm 1960 s – Ted Nelson, Xanadu, “a universal instantaneous hypertext publishing network” 1967 – Nicholas Negroponte formed MIT Architecture Machine Group (later in 1985 MIT Media Lab opens) 1987 – RCA’s David Sarnoff Labs’ announce Digital Video Interactive 1988 – Apple “Knowledge Navigator” vision © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 9 Carnegie Mellon

Multimedia Timeline, Continued 1989 – Tim Berners-Lee proposed the World Wide Web to CERN

Multimedia Timeline, Continued 1989 – Tim Berners-Lee proposed the World Wide Web to CERN 1991 – Motion Picture Experts Group 1993 – NCSA Mosaic 1994 – Netscape; creation of World Wide Web Consortium (W 3 C) 1995 – JAVA for platform-independent application development 1996 – PNG (Portable Network Graphics) 1997 – HTML 4. 0 1998 – XML 1. 0 1999 – XSLT 1. 0 and Xpath 1. 0 2001 – MPEG-7, JPEG 2000, SVG 2002 – intellectual property and JPEG 2000 (www. jpeg. org/newsrel 1. html) Help with alphabet soup: http: //www. w 3 c. org, other on-line multimedia course glossaries, e. g. , http: //www. cs. cornell. edu/courses/cs 631/1999 sp/ © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 10 Carnegie Mellon

Top Ten Misconceptions about Multimedia Computing Ramesh Jain, founding chairman of Virage and CTO

Top Ten Misconceptions about Multimedia Computing Ramesh Jain, founding chairman of Virage and CTO of Praja, www. praja. com, presented the following “top ten” MISCONCEPTIONS list as part of his keynote speech at the ACM Multimedia Conference, Ottawa, Canada, October 2, 2001: 10. Video = Multimedia. 9. Multimedia = multi X separate medium. 8. All information is ONLY in the images or video. 7. Editing of media is almost always off-line. 6. Query by example is best access method. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 11 Carnegie Mellon

Top Ten Misconceptions about Multimedia Computing, Continued 5. All users have Ph. Ds in

Top Ten Misconceptions about Multimedia Computing, Continued 5. All users have Ph. Ds in multimedia computing. 4. Users have no memory or context. 3. Computers are for computing. 2. Medium is the message. 1. We work for computers. Ramesh Jain concluded his keynote talk with the observation: Information Builds Experience, Experience is Life. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 12 Carnegie Mellon

Multimedia Audio Networking Natural Language Processing Psychology Storage Systems Video Information Retrieval Images Data

Multimedia Audio Networking Natural Language Processing Psychology Storage Systems Video Information Retrieval Images Data Compression HCI CPU Power

Multimedia Physics • Sound is a waveform • Imagery is a waveform • light

Multimedia Physics • Sound is a waveform • Imagery is a waveform • light is electromagnetic radiation with different intensity in spatial coordinates • color corresponds to wavelength (red is the longest wavelength visible by people) • Introductory treatment of “light behaves as both particle and wave” at http: //www. howstuffworks. com/light 1. htm • “Distributed Multimedia” by Palmer Agnew and Anne Kellerman, published by Atomic Dog Publishing, http: //www. atomicdogpublishing. com © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 14 Carnegie Mellon

A Quick Introduction to Light Waves • Derived from: http: //www. pbs. org/deepspace/classroom/activity 2.

A Quick Introduction to Light Waves • Derived from: http: //www. pbs. org/deepspace/classroom/activity 2. html • Waves characterized by wavelength and frequency wavelength • Light is a type of electromagnetic radiation in a range for which our eyes are sensitive • Sound is not electromagnetic radiation, but sound is a wave as well. Higher pitches are caused by higher frequencies of vibrating molecules that reach your eardrum. Lower pitches are likewise caused by lower frequencies. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 15 Carnegie Mellon

Wavelength/ Frequency Spectrum TV, FM Infrared Ultraviolet Long radio waves Microwaves 700 nm X-rays

Wavelength/ Frequency Spectrum TV, FM Infrared Ultraviolet Long radio waves Microwaves 700 nm X-rays 600 nm 5 x 1014 Hz 6 x 1014 Hz 4. 5 x 1014 Hz © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 16 Gamma rays 400 nm 7 x 1014 Hz Carnegie Mellon

Migration from Analog to Digital Representation • Analog signals to sensors • • E.

Migration from Analog to Digital Representation • Analog signals to sensors • • E. g. vinyl records Fidelity is faithfulness to the original • Digital representation (1960 s) • Sampling • Quantizing • Coding • Limiting factors in move to digital: • • Storage limits CPU speeds I/O speeds Network bandwidth © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 17 Carnegie Mellon

Loss of Fidelity Due to Sampling © Copyright 2002 Michael G. Christel and Alexander

Loss of Fidelity Due to Sampling © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 18 Carnegie Mellon

Loss of Fidelity Due to Quantizing © Copyright 2002 Michael G. Christel and Alexander

Loss of Fidelity Due to Quantizing © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 19 Carnegie Mellon

Overview of Compression Strategies • Lossless Compression • • • Huffman Encoding Adaptive Huffman

Overview of Compression Strategies • Lossless Compression • • • Huffman Encoding Adaptive Huffman Encoding Lempel-Ziv-Welch (LZW) • used in GIF • JPEG-LS • Lossy Compression • JPEG • H. 261, MPEG-2 • Lossless and Lossy Together • JPEG 2000 © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 20 Carnegie Mellon

Huffman Encoding Procedure 1. Initialization: Put all items in a list L, sorted by

Huffman Encoding Procedure 1. Initialization: Put all items in a list L, sorted by freq. 2. Repeat until L has only one node left: (a) From L pick two nodes having the lowest frequency, create a parent node of them. (b) Assign the sum of the children's frequencies to the parent node and insert it into L (kept in sorted order). (c) Assign code 0, 1 to the two branches of the tree, and delete the children from L. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 21 Carnegie Mellon

Huffman Coding Example • Input: “ALOHA HAWAII” • Frequency: 4 A, 2 H, 2

Huffman Coding Example • Input: “ALOHA HAWAII” • Frequency: 4 A, 2 H, 2 I, 1 L, 1 O, 1 space, 1 W • 96 bits (8 bits * 12 characters) to 32 bits: 0 1 A 1 0 I 0 1 0 H 0 1 L [space] 1 W 0 1 O A=0, I=100, H=101, L=1100, space=1101, etc. RECOMMENDED: Java applet example at http: //www. cs. sfu. ca/CC/365/li/squeeze/index. html © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 22 Carnegie Mellon

Why Digital? • • • Universal storage, transmission format CD, Internet Precision (range of

Why Digital? • • • Universal storage, transmission format CD, Internet Precision (range of values, number of bits, floating point) • Lossless transmission/storage BUT: • Sampling rate distorts information • Size requirements may be huge compared to analog, e. g. , 4. 2 million pixels for single 35 mm photograph! results in lots of work on perception-based lossy digital compression strategies © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 23 Carnegie Mellon

Why Perception Matters http: //www. libertarian. on. ca/images/Florida%20 Recount. jpg

Why Perception Matters http: //www. libertarian. on. ca/images/Florida%20 Recount. jpg

Audio • Sounds • • Hear 15 Hz to 20 k. Hz Speech is

Audio • Sounds • • Hear 15 Hz to 20 k. Hz Speech is 50 Hz to 10 k. Hz • Speech Recognition • It is hard to wreck a nice beach / It is hard to recognize speech • Ice cream / I scream • Synthesis • Speech • Music • MIDI for 127 instruments, 47 percussion sounds • Notes, timing © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 25 Carnegie Mellon

Speech Recognition Issues • Continuous vs. discrete • Vocabulary size • Channel (microphone) •

Speech Recognition Issues • Continuous vs. discrete • Vocabulary size • Channel (microphone) • Environment (location of microphone and speaker) • Speaker dependent/speaker independent • Context (language model) • Interactivity (dialog model) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 26 Carnegie Mellon

Speech Recognition Knowledge Sources Acoustic Modeling Describes the sounds that make up speech Speech

Speech Recognition Knowledge Sources Acoustic Modeling Describes the sounds that make up speech Speech Recognition Lexicon Describes which sequences of speech sounds make up valid words Language Model Describes the likelihood of various sequences of words being spoken

Speech Variations Style Variations Voice careful, clear, articulated, formal, casual spontaneous, normal, read, dictated,

Speech Variations Style Variations Voice careful, clear, articulated, formal, casual spontaneous, normal, read, dictated, intimate Quality breathy, creaky, whispery, tense, lax, modal Context sport, professional, interview, free conversation, man-machine dialogue Speaking Rate normal, slow, fast, very fast Stress in noise, with increased vocal effort (Lombard reflex), emotional factors (e. g. angry), under cognitive load

Video • Video is made up of frames • • • Frame rate =

Video • Video is made up of frames • • • Frame rate = delay between successive frames Minimal change between frames Sequencing creates the illusion of movement • 16 frames per second (fps) is “smooth” • Standards: NTSC 29. 97 fps, PAL fps, HDTV 60 fps • Interlacing • Display scan rate is different • Monitor refresh rate, e. g. , 60 -70 Hz = ~1/second © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 29 Carnegie Mellon

Captured vs. Synthetic • Animation vs. Video • Vector Graphics vs. Bitmap/Raster Pictures •

Captured vs. Synthetic • Animation vs. Video • Vector Graphics vs. Bitmap/Raster Pictures • Synthesizer vs. Recording • Storage? Manipulation? Processor Requirements? • Fidelity to real world • Hybrids are possible © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 30 Carnegie Mellon

Why is Multimedia Important? • Our society captures its experience, records its accomplishments, portrays

Why is Multimedia Important? • Our society captures its experience, records its accomplishments, portrays its past informs its masses ……in pictures, audio and video • • • For many, CNN has become the “publication of record” • Multimedia learning leverages “multiple intelligences” • Multimedia Digital Libraries are an essential component of • • formal, informal, and professional learning distance education, telemedicine © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 31 Carnegie Mellon

Technology Push vs. Market Pull • Home Entertainment • Catalog Ordering • Multimedia Training,

Technology Push vs. Market Pull • Home Entertainment • Catalog Ordering • Multimedia Training, Education • Videoconferencing • Professional Video Services • Videomail • Speech Recognition © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 32 Carnegie Mellon

Hype vs. Reality What is feasible, under what circumstances? What is possible? What is

Hype vs. Reality What is feasible, under what circumstances? What is possible? What is impossible? What is unlikely? © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 33 Carnegie Mellon

A Multimedia Vision for the Home Market FX Palo Alto Laboratory John J. Doherty,

A Multimedia Vision for the Home Market FX Palo Alto Laboratory John J. Doherty, Lynn Wilcox, and Andreas Girgensohn “A Night at the Opera” Video to appear as part of the ACM Multimedia Conference, 2002 (7: 11) © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 34 Carnegie Mellon

Upcoming Homework Register: send email to christel@cs. cmu. edu with URL where your homeworks

Upcoming Homework Register: send email to christel@cs. cmu. edu with URL where your homeworks will be located (we will use that URL plus your sending email address for future correspondence) – before Oct. 28 Homework 1: Multimedia lookup via the web – Oct. 28 Homework 2: Scanning and image search – Oct. 30/Nov. 4 Homework 3: Animation via Macromedia Flash – Nov. 24 Homeworks 4, 5, 6, 7 for later in the term Homework 8: Multimedia web site – Dec. 12 See syllabus for details © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 35 Carnegie Mellon