COMPUTER AUDIO CGDD 4003 What is Sound Compressions

  • Slides: 42
Download presentation
COMPUTER AUDIO CGDD 4003

COMPUTER AUDIO CGDD 4003

What is Sound? Compressions of air or other media (such as water or metal)

What is Sound? Compressions of air or other media (such as water or metal) from something vibrating Sounds are made up of high frequency and low frequency sounds Frequency Don’t confuse pitch (frequency) with volume! Volume is measured in decibels (d. B) Frequency in Hertz (Hz) = cycles per second Humans only hear from 20 Hz to 20 KHz Strange Fact! Speed of sound (Air: 340 m/s; Water: 1, 230 m/s; Gold: 3, 240 m/s):

Spatial Sound 1 Channel – “mono”. Can be split to several speakers; still no

Spatial Sound 1 Channel – “mono”. Can be split to several speakers; still no direction 2 Channels – “stereo”. Fades from left to right. Can determine direction 5. 1 Audio – Common for home theaters 3 D Sound? – Video games (PC). Still has time to develop

The Human Side (20 Hz-20 KHz)

The Human Side (20 Hz-20 KHz)

The Equal Loudness Contour Killa Hurts Ouch!

The Equal Loudness Contour Killa Hurts Ouch!

A Note about decibels A decibel is 1/10 th of a bel Abbreviated d.

A Note about decibels A decibel is 1/10 th of a bel Abbreviated d. B This is the perceived loudness, which increases linearly as power increases exponentially � Something sounds twice as loud? � 10·log 10(2) = 3. 01 d. B In gaming, volume usually ranges 0. 0 f-1. 0 f

Human Perception (Inter. Aural Time Difference) Sound hits both ears Difference in time Hasn’t

Human Perception (Inter. Aural Time Difference) Sound hits both ears Difference in time Hasn’t gotten to left yet Hits Right Ear First

How Computers Perceive Sound Digitization (DAC and ADC) Computers “listen” to the amplitude a

How Computers Perceive Sound Digitization (DAC and ADC) Computers “listen” to the amplitude a certain number of times per second (sample rate) � � � 44 K is CD 22 K is good 8 K is lame Computers have to approximate what they heard and assign it a number � � 4 bits = 16 level to approximate to 16 bits = 2 million levels to approximate to

Original Sound Amplitude (in d. B) Frequency

Original Sound Amplitude (in d. B) Frequency

Low Sampling Rate TIME

Low Sampling Rate TIME

Low Sampling Rate What the computer hears TIME

Low Sampling Rate What the computer hears TIME

High Sampling Rate TIME

High Sampling Rate TIME

High Sampling Rate TIME

High Sampling Rate TIME

2 bits per sample 4 Approximations TIME

2 bits per sample 4 Approximations TIME

Stair. Step Effect Called “quantization errors” TIME

Stair. Step Effect Called “quantization errors” TIME

3 bits per sample 8 Approximations TIME

3 bits per sample 8 Approximations TIME

Less Stair. Step TIME

Less Stair. Step TIME

Signal to Noise Ratio (SNR) Represents the quanitization error � 8 -bits = 128

Signal to Noise Ratio (SNR) Represents the quanitization error � 8 -bits = 128 discrete values (upper-half only) � Sample is rounded up or down � SNR is 256: 1 � 256: 1 translates to 48 d. B (difference in average noise to max signal) � 16 -bit = 32 K discrete values (upper-half) � SNR = 65, 536: 1, or 96 d. B

In General Sampling rate affects range of frequencies you can capture (Nyquist) Bits per

In General Sampling rate affects range of frequencies you can capture (Nyquist) Bits per sample affects noise level as well as volume range What about recording: � Rock? � Mozart (or anything on NPR for that matter)? � Voice/dialog?

Capturing Sounds Usually done with: � � Computer has sound card � � a

Capturing Sounds Usually done with: � � Computer has sound card � � a microphone (such as voice) Line in CD Hollywood Edge® Input types (RCA, MIDI, mini, ¼”, XLR) Card has quality (plays 16 -bit sound) Need some kind of software � � Sound. Forge/Audacity Windows Sound. Recorder (gag)

Typical Pipeline Permanent Storage Decoding (from mp 3, ogg, etc) Individual Channel Memory Buffer

Typical Pipeline Permanent Storage Decoding (from mp 3, ogg, etc) Individual Channel Memory Buffer Sound Channel Processing (2 D/3 D effects) Hardware mixing and DAC

Sample Playback � Loaded entirely into memory (called “sample” as well) � Streamed (pre-buffer

Sample Playback � Loaded entirely into memory (called “sample” as well) � Streamed (pre-buffer data using a circular buffer) Channel properties � Pan – left/right � Pitch – frequency � Volume

Compressed Audio Requires a codec (compress/decompress) Lossless (e. g. . zip files) Lossy �

Compressed Audio Requires a codec (compress/decompress) Lossless (e. g. . zip files) Lossy � Bit-reduction (ADPCM, reduces bps from 16 to 4) Simple Used on Sony PSP, Wii and Nintendo DS � Physcho-acoustics (. mp 3, . ogg, . wma) Discard sound we don’t normally hear anyway Hard to implement CPU intensive PS 3, Xbox 360, PCs Note: mp 3 format requires licensing fees to Franhofer-Thompson!

ADSR Envelopes Used for defining the volume of a sound D ec Atta ck

ADSR Envelopes Used for defining the volume of a sound D ec Atta ck ay Sustai n e as le Re Volume Time

3 D Sound Don’t have 5. 1? � Panning is one option � Psycho-acoustic

3 D Sound Don’t have 5. 1? � Panning is one option � Psycho-acoustic options � Head-Relative Transfer Function (HRTF) � Tweak the frequencies to match your ears Sounds have position and velocity There is a listener component (like a camera) Relationship between the two � Attenuation (with distance) � Occlusion (low-pass filter) � Doppler (relative velocities)

3 D Sound Environmental effects � Reverb (depends on materials in room) � Echo

3 D Sound Environmental effects � Reverb (depends on materials in room) � Echo (depends on size of room) � Occlusion (a wall blocking part of the sound) � Obstruction (no direct path to the listener Competing reverb technologies � I 3 DL 2 (Interactive 3 D Audio Rendering Level 2) � EAX (Creative Labs) � Almost identical

MIDI (Musical Instrument Digital Interface) MIDI – a method for representing sounds electronically Became

MIDI (Musical Instrument Digital Interface) MIDI – a method for representing sounds electronically Became popular in the 80’s Send 16 different channels (tracks) at one time Have a total of 128 possible instruments

The Keyboard The MIDI Keyboard No audible sounds Generates a series of 1’s a

The Keyboard The MIDI Keyboard No audible sounds Generates a series of 1’s a 0’s (on/off) Signals represent � Note, loudness � Length, type of instrument… Signals come out of the keyboard and usually go into a sequencer

The Sequencer Can be a PC Responsible for recording individual tracks of music Responsible

The Sequencer Can be a PC Responsible for recording individual tracks of music Responsible for playback Receives input from keyboard Sends output to synthesizer

The Synthesizer Receives 1’s and 0’s from the sequencer Interprets the 1’s and 0’s

The Synthesizer Receives 1’s and 0’s from the sequencer Interprets the 1’s and 0’s to produce audible sounds � Piano � Drums… � Saxophone… Sounds are sent to speakers

Speakers Like you haven’t seen these before…

Speakers Like you haven’t seen these before…

MIDI 01101101000110 0 1 0

MIDI 01101101000110 0 1 0

MIDI vs Digital Recording MIDI: � Smaller file size (like 10 -20 K) �

MIDI vs Digital Recording MIDI: � Smaller file size (like 10 -20 K) � Change keys/tempo/looping on the fly! � Song sounds different on every sound card � No singing allowed! � Also a DLS format (Down. Loadable Sound) Digital Recording: � Larger file size (like 5 M) � Sound is close approximation to real thing

Sampling There are two main approaches to synthesis: � Sampling � FM Synthesis Sampling

Sampling There are two main approaches to synthesis: � Sampling � FM Synthesis Sampling �A sample is a recording of actual instrument/sound � Samples are taken at certain intervals � Samples are then shifted up or down depending on the note

Sampling

Sampling

FM Synthesis Basic waves: � Sine � Square � Saw � Triangle � Noise

FM Synthesis Basic waves: � Sine � Square � Saw � Triangle � Noise

FM Synthesis Start with basic waveform, and have one wave modulate the other Here’s

FM Synthesis Start with basic waveform, and have one wave modulate the other Here’s volume modulation � 440 sine wave, control 2 Hz: � 440 sine wave, control 880 Hz: � 440 sine wave, control 3 KHz:

Interactive Music adapts based on current state of game Music broken into chunks �

Interactive Music adapts based on current state of game Music broken into chunks � Called segments (or cues) � Can be played back to back � Can be smoothly cross-faded Segments are combined into themes fmod’s Sound Designer can do this

Themes in fmod

Themes in fmod

Sound Variations Sounds can be triggered by events There’s no reason to play the

Sound Variations Sounds can be triggered by events There’s no reason to play the same sound the same way � Pick a random sample � Change pitch � Change attenuation

Other technology Lip-synch � Use the amplitude of the wave to control mouth �

Other technology Lip-synch � Use the amplitude of the wave to control mouth � Analyze phonemes of sample (language neutral)

Common Audio Technology XAudio (free) – cross-platform Open. AL (free) – cross-platform XACT (free)

Common Audio Technology XAudio (free) – cross-platform Open. AL (free) – cross-platform XACT (free) – Xbox/Windows fmod (commercial) – cross-platform