Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong

  • Slides: 34
Download presentation
Carnegie Mellon

Carnegie Mellon

Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) http: //www. cs. cmu. edu/~alex/mm. Course

Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) http: //www. cs. cmu. edu/~alex/mm. Course Carnegie Mellon

How to get in touch with us • • – Mike Christel • christel@cs.

How to get in touch with us • • – Mike Christel • christel@cs. cmu. edu • http: //www. cs. cmu. edu/~christel • (412)268 -7799 or x 8 -7799 • We. H 5212 Alex Hauptmann • alex@cs. cmu. edu • http: //www. cs. cmu. edu/~alex • (412)268 -1448 or x 8 -1448 • We. H 5124 Office Hours by Appointment Carnegie Mellon

Teaching Assistant • Rong Jin • jin+@andrew. cmu. edu • Office We. H 5316

Teaching Assistant • Rong Jin • jin+@andrew. cmu. edu • Office We. H 5316 • Office hours by appointment • (412)268 -4050 or x 8 -4050 Carnegie Mellon

Course Outline, Part 1 of 3 More details at www. cs. cmu. edu/~alex/mm. Course

Course Outline, Part 1 of 3 More details at www. cs. cmu. edu/~alex/mm. Course October 22 October 25 Intro to Multimedia Enabling Technologies, Macromedia Flash Intro and Demo October 29 Sound Processing, Speech Recognition November 1 Digital Video Creation and Transmission November 5 Speech Synthesis Carnegie Mellon

Course Outline, Part 2 of 3 More details at www. cs. cmu. edu/~alex/mm. Course

Course Outline, Part 2 of 3 More details at www. cs. cmu. edu/~alex/mm. Course November 8 Image Processing November 12 Digital Music and Music Processing November 15 Multimedia Internet Protocols, SMIL November 19 Synthetic Interviews: A Multimedia Company (Experiences from the Field) November 22 Programming for Interactive Multimedia (CGI Scripts/ASP) Carnegie Mellon

Course Outline, Part 3 of 3 More details at www. cs. cmu. edu/~alex/mm. Course

Course Outline, Part 3 of 3 More details at www. cs. cmu. edu/~alex/mm. Course November 29 Content Analysis and Coding of Digital Audio and Video, Multimedia Storage and Retrieval Management. December 3 Video Retrieval Evaluation and Testing Multimedia Interface Design, Digital Libraries December 6 Visual Design, Multimedia Interface Design Guidelines, Multimedia use in the future (Experience on Demand) December 10 Multimedia as Entertainment Technology, Virtual Reality Carnegie Mellon

Homeworks • See http: //www. cs. cmu. edu/~alex/mm. Course • 9 Homeworks planned, 10

Homeworks • See http: //www. cs. cmu. edu/~alex/mm. Course • 9 Homeworks planned, 10 points each • One hard homework will be worth 20 points • No final, no midterm • Publish homeworks on your web page - email us URL • Space? Carnegie Mellon

Today: Intro to Multimedia Apple Knowledge Navigator Vision 1988 Carnegie Mellon

Today: Intro to Multimedia Apple Knowledge Navigator Vision 1988 Carnegie Mellon

Multimedia Audio Networking Natural Language Processing Psychology Storage Systems Video Information Retrieval Images Data

Multimedia Audio Networking Natural Language Processing Psychology Storage Systems Video Information Retrieval Images Data Compression HCI CPU Power

Definition of Multimedia • Multi (latin multus - numerous) • Media, medium (latin medius,

Definition of Multimedia • Multi (latin multus - numerous) • Media, medium (latin medius, medium: middle, center, intermediary; latin mediat: intermediary, means) • Multiple types of information captured, stored, manipulated, transmitted, and presented. • Specifically: Images, Video, Audio (+Speech) and Text Carnegie Mellon

Definition of Multimodal • Multi (latin multus - numerous) • Modal (latin modus: manner)

Definition of Multimodal • Multi (latin multus - numerous) • Modal (latin modus: manner) • Traditionally refers to input/output formats: • Input: • sounds, speech (mike) • Output: • sounds, speech • gestures (camera, tablet) • video • eye-gaze (camera), • Pictures • mouse, • Animations • keyboard • Text Carnegie Mellon

Perceived Information • Physical Variables • Sound is a waveform • An image is

Perceived Information • Physical Variables • Sound is a waveform • An image is a waveform • light is electromagnetic radiation with different intensity in spatial coordinates • color corresponds to wavelength Carnegie Mellon

History of Multimedia I • • Analog signals to sensors • E. g. vinyl

History of Multimedia I • • Analog signals to sensors • E. g. vinyl records • Fidelity is faithfulness to the original Digital representation (‘ 60 s) • Sampling • Quantizing • Coding • codec, modem, (A/D and D/A) Carnegie Mellon

Hardware Advances • • • CPU Bus Network I/O Keyboard, Mouse Disk Mike +

Hardware Advances • • • CPU Bus Network I/O Keyboard, Mouse Disk Mike + A/D Board Camera + A/D Board Speakers (+ D/A Board) Display Carnegie Mellon

History of Multimedia II • Analog controls only • Special hardware (Displays, Scanners, FFTs)

History of Multimedia II • Analog controls only • Special hardware (Displays, Scanners, FFTs) • Integrated hardware components • Further Integration • Other devices Carnegie Mellon

History of Multimedia III Limiting Factors: • Storage Limits • CPU Speeds • I/O

History of Multimedia III Limiting Factors: • Storage Limits • CPU Speeds • I/O Speeds • Network Bandwidth Carnegie Mellon

Why Digital? • Universal storage, transmission format • CD, internet • Precision (Range of

Why Digital? • Universal storage, transmission format • CD, internet • Precision (Range of values, number of bits, floating point) • Lossless transmission/storage BUT: • sampling rate distorts information • size requirements may be ‘large’ compared to analog Carnegie Mellon

Digitization Process • Sampling from an analog signal • Sampling Errors relate to signal

Digitization Process • Sampling from an analog signal • Sampling Errors relate to signal frequencies • Quantization Errors Carnegie Mellon

Text • ASCII, Unicode • Formatted Text, Rich Text • Document Formats: – Structured:

Text • ASCII, Unicode • Formatted Text, Rich Text • Document Formats: – Structured: Tex, HTML – Page Descriptions: Postscript, PDF Carnegie Mellon

Graphics • Objects – circles, splines, rectangles, lines • Editable – resize, reshape, move,

Graphics • Objects – circles, splines, rectangles, lines • Editable – resize, reshape, move, colorize • Synthetic Carnegie Mellon

Images (Pictures) • Fixed digitized representation – bitmap, colors per pixel • Editable in

Images (Pictures) • Fixed digitized representation – bitmap, colors per pixel • Editable in limited ways – retouch, cut and paste, remap colors, filter [Photoshop tools] – no ‘model’ of the thing • Captured – not just from real life, clip art, screen dump Carnegie Mellon

Audio • Sounds – hear 15 Hz to 20 k. Hz – Speech is

Audio • Sounds – hear 15 Hz to 20 k. Hz – Speech is 50 Hz to 10 k. Hz • Speech Recognition – It is hard to wreck a nice beach – Ice cream I scream • Synthesis – Speech – Music MIDI for 127 instruments, 47 percussion sounds Notes, timing Carnegie Mellon

Speech Recognition Issues • • Continuous vs Discrete Vocabulary Size Channel (Microphone) Environment (Location

Speech Recognition Issues • • Continuous vs Discrete Vocabulary Size Channel (Microphone) Environment (Location of mike and Speaker) Speaker Dependent/Speaker Independent Context (Language Model) Interactivity (Dialog Model) Carnegie Mellon

Speech Recognition Knowledge Sources Acoustic Modeling Describes the sounds that make up speech Speech

Speech Recognition Knowledge Sources Acoustic Modeling Describes the sounds that make up speech Speech Recognition Lexicon Describes which sequences of speech sounds make up valid words Language Model Describes the likelihood of various sequences of words being spoken Carnegie Mellon

Speech Variations Style Variations Voice careful, clear, articulated, formal, casual spontaneous, normal, read, dictated,

Speech Variations Style Variations Voice careful, clear, articulated, formal, casual spontaneous, normal, read, dictated, intimate Quality breathy, creaky, whispery, tense, lax, modal Context sport, professional, interview, free conversation, man-machine dialogue Speaking Rate normal, slow, fast, very fast Stress in noise, with increased vocal effort (Lombard reflex), emotional factors (e. g. angry), under cognitive load Carnegie Mellon

Video • Frames comprise the video – Frame rate = delay between successive frames

Video • Frames comprise the video – Frame rate = delay between successive frames – minimal change between frames • Sequencing creates the illusion of movement > 16 fps is “smooth” Standards: 29. 97 is NTSC, 25 is PAL, 60 is HDTV Interlacing • Display scan rate is different – monitor refresh rate – 60 - 70 Hz (= 1/s) Carnegie Mellon

Captured vs. Synthetic • Animation vs Video • Graphics vs Pictures • Synthesizer vs

Captured vs. Synthetic • Animation vs Video • Graphics vs Pictures • Synthesizer vs Recording • Storage? Manipulation? Processor Requirements? • Fidelity to real world • Hybrids are possible Carnegie Mellon

Why is Multimedia Important? • Our society captures its experience, – records its accomplishments,

Why is Multimedia Important? • Our society captures its experience, – records its accomplishments, – portrays its past – informs its masses ……in pictures, audio and video – • For many, CNN has become the “publication of record” • Multimedia learning leverages “multiple intelligences” Gardner, 1993 • Multimedia Digital libraries are an essential component of formal, informal, and professional learning – distance education, telemedicine – Carnegie Mellon

Technology Push vs Market Pull – – – – Home Entertainment Catalog Ordering Multimedia

Technology Push vs Market Pull – – – – Home Entertainment Catalog Ordering Multimedia Training, Education Videoconferencing Professional Video Services Videomail Speech Recognition Carnegie Mellon

Hype vs. Reality • What is feasible, under what circumstances? • What is possible?

Hype vs. Reality • What is feasible, under what circumstances? • What is possible? • What is impossible? • What is unlikely? Carnegie Mellon

Multimedia Visions • • DARPA: Dominate the Battle Space HP “ 1995” LSI “Flash

Multimedia Visions • • DARPA: Dominate the Battle Space HP “ 1995” LSI “Flash Point” HP “Synergies” Carnegie Mellon

Intro to Multimedia That’s all for today Carnegie Mellon

Intro to Multimedia That’s all for today Carnegie Mellon

Carnegie Mellon

Carnegie Mellon