Digital Music Music Processing George Tzanetakis Post Doctoral

  • Slides: 42
Download presentation
Digital Music & Music Processing George Tzanetakis Post. Doctoral Fellow Computer Science Department Carnegie

Digital Music & Music Processing George Tzanetakis Post. Doctoral Fellow Computer Science Department Carnegie Mellon University gtzan@cs. cmu. edu http: //www. cs. cmu. edu/~gtzan Copyright Nov. 2002, George Tzanetakis

Overview Music Information Retrieval (MIR) and Computer Audition Motivation Techniques Applications Computer Music and

Overview Music Information Retrieval (MIR) and Computer Audition Motivation Techniques Applications Computer Music and Sound Synthesis Examples, demos Copyright Nov. 2002, George Tzanetakis

MIR Music History 9000 B. C 1877 1000 1960 Copyright Nov. 2002, George Tzanetakis

MIR Music History 9000 B. C 1877 1000 1960 Copyright Nov. 2002, George Tzanetakis 1700 2002

Music 4 million recorded CD tracks 4000 CDs / month Mp 3 bandwidth %

Music 4 million recorded CD tracks 4000 CDs / month Mp 3 bandwidth % Global Pervasive Persistent Why ? Copyright Nov. 2002, George Tzanetakis

The future of MIR Library of all recorded music Tasks: organize, search, retrieve, classify,

The future of MIR Library of all recorded music Tasks: organize, search, retrieve, classify, recommend, browse, listen, annotate Examples: Copyright Nov. 2002, George Tzanetakis

Audio MIR Pipeline Hearing Representation Signal Processing Understanding Analysis Machine Learning Reacting Interaction Human

Audio MIR Pipeline Hearing Representation Signal Processing Understanding Analysis Machine Learning Reacting Interaction Human Computer Interaction Copyright Nov. 2002, George Tzanetakis

Traditional Music Representations Copyright Nov. 2002, George Tzanetakis

Traditional Music Representations Copyright Nov. 2002, George Tzanetakis

Time domain waveform pressure time frequency Decompose into building blocks time Copyright Nov. 2002,

Time domain waveform pressure time frequency Decompose into building blocks time Copyright Nov. 2002, George Tzanetakis

MIDI Musical Instrument Digital Interface Hardware interface File format Note events Duration, discrete pitch,

MIDI Musical Instrument Digital Interface Hardware interface File format Note events Duration, discrete pitch, “instrument” Extensions General MIDI Notation, OMR, continuous pitch Copyright Nov. 2002, George Tzanetakis

Symbolic vs Audio MIR Audio Polyphonic Transcription Computer Audition Symbolic Representation (MIDI) MIR Machine

Symbolic vs Audio MIR Audio Polyphonic Transcription Computer Audition Symbolic Representation (MIDI) MIR Machine Learning Models MIR Copyright Nov. 2002, George Tzanetakis

Feature extraction Copyright Nov. 2002, George Tzanetakis

Feature extraction Copyright Nov. 2002, George Tzanetakis

Timbral Texture Timbre = differentiate sounds of same loudness, pitch Timbral Texture = differentiate

Timbral Texture Timbre = differentiate sounds of same loudness, pitch Timbral Texture = differentiate mixtures of sounds Global, statistical and fuzzy properties Copyright Nov. 2002, George Tzanetakis

Spectrum M M t t+1 Copyright Nov. 2002, George Tzanetakis

Spectrum M M t t+1 Copyright Nov. 2002, George Tzanetakis

Fourier Transform P=1/f Copyright Nov. 2002, George Tzanetakis

Fourier Transform P=1/f Copyright Nov. 2002, George Tzanetakis

Short Time Fourier Transform STFT Filterbank interpretation Amplitude Frequency output Filters Oscillators Copyright Nov.

Short Time Fourier Transform STFT Filterbank interpretation Amplitude Frequency output Filters Oscillators Copyright Nov. 2002, George Tzanetakis

Short Time Fourier Transform II M t+1 Copyright Nov. 2002, George Tzanetakis

Short Time Fourier Transform II M t+1 Copyright Nov. 2002, George Tzanetakis

Formants From “Real Time Synthesis for Interactive Applications” P. Cook, A. K Peters Press,

Formants From “Real Time Synthesis for Interactive Applications” P. Cook, A. K Peters Press, used by permission Copyright Nov. 2002, George Tzanetakis

Linear Prediction Coefficients Impulses @ f 0 White Noise Source Lossless tubes Filter Copyright

Linear Prediction Coefficients Impulses @ f 0 White Noise Source Lossless tubes Filter Copyright Nov. 2002, George Tzanetakis Speech

MPEG Audio Coding (mp 3) 32 linearly spaced bands Perceptual Audio Coding Analysis Filterbank

MPEG Audio Coding (mp 3) 32 linearly spaced bands Perceptual Audio Coding Analysis Filterbank Psychoacoustic Model Encoder: Slower, Complicated Decoder: Faster, Simpler Available bits Copyright Nov. 2002, George Tzanetakis

Spectral Shape Centroid Rolloff Flux RMS Moments … M t Copyright Nov. 2002, George

Spectral Shape Centroid Rolloff Flux RMS Moments … M t Copyright Nov. 2002, George Tzanetakis

Summary of Timbral Texture Features Time-Frequency analysis Signal Processing (STFT, DWT) Source-filter (LPC) Perceptual

Summary of Timbral Texture Features Time-Frequency analysis Signal Processing (STFT, DWT) Source-filter (LPC) Perceptual (MP 3) Spectral Shape to feature vector Copyright Nov. 2002, George Tzanetakis

Pitch Content Harmony-melody = pitch concepts Music theory score = music Bridge to symbolic

Pitch Content Harmony-melody = pitch concepts Music theory score = music Bridge to symbolic MIR Automatic music transcription Non-transcriptive arguments Copyright Nov. 2002, George Tzanetakis

Automatic Pitch Detection P=1/f Time-domain Frequency-domain Perceptual Zerocrossings Autocorrelation analysis = peaks of function

Automatic Pitch Detection P=1/f Time-domain Frequency-domain Perceptual Zerocrossings Autocorrelation analysis = peaks of function correspond to dominant pitches Copyright Nov. 2002, George Tzanetakis

Pitch Histograms Chroma - folded Height - unfolded Jazz Copyright Nov. 2002, George Tzanetakis

Pitch Histograms Chroma - folded Height - unfolded Jazz Copyright Nov. 2002, George Tzanetakis Irish

Automatic Music Transcription Original Transcribed Estimate # of voices Mixture signal Noise suppresion Predominant

Automatic Music Transcription Original Transcribed Estimate # of voices Mixture signal Noise suppresion Predominant Pitch Estimation Copyright Nov. 2002, George Tzanetakis Remove detected sound

Rhythm Movement in time Origins in Poetry (iambic, trochaic) Foot tapping definition Hierarchical semi-periodic

Rhythm Movement in time Origins in Poetry (iambic, trochaic) Foot tapping definition Hierarchical semi-periodic structure at multiple levels of detail Links to motion, dance Running vs global Copyright Nov. 2002, George Tzanetakis

Self similarity DWT Autocorrelation Peak Picking Envelope Extraction Copyright Nov. 2002, George Tzanetakis Beat

Self similarity DWT Autocorrelation Peak Picking Envelope Extraction Copyright Nov. 2002, George Tzanetakis Beat Histograms

Beat Histograms Copyright Nov. 2002, George Tzanetakis

Beat Histograms Copyright Nov. 2002, George Tzanetakis

Analysis Classification Segmentation Similarity Retrieval Clustering Thumbnailing Fingerprinting Copyright Nov. 2002, George Tzanetakis

Analysis Classification Segmentation Similarity Retrieval Clustering Thumbnailing Fingerprinting Copyright Nov. 2002, George Tzanetakis

Analysis Overview Trajectory Musical Piece Point Copyright Nov. 2002, George Tzanetakis

Analysis Overview Trajectory Musical Piece Point Copyright Nov. 2002, George Tzanetakis

Query-by-example Content-based Retrieval Ranked list of k nearest neighbors Copyright Nov. 2002, George Tzanetakis

Query-by-example Content-based Retrieval Ranked list of k nearest neighbors Copyright Nov. 2002, George Tzanetakis

QBE examples Query Rock: Beatles Jazz: Bobby Hutserson Funk: Mano negra World: Tibetan singer

QBE examples Query Rock: Beatles Jazz: Bobby Hutserson Funk: Mano negra World: Tibetan singer Computer Music: Paul Lansky Copyright Nov. 2002, George Tzanetakis Match

Automatic Musical Genre Classification Categorical music descriptions created by humans Fuzzy boundaries Statistical properties

Automatic Musical Genre Classification Categorical music descriptions created by humans Fuzzy boundaries Statistical properties Timbral texture, rhythmic structure, harmonic content Automatic musical genre classification Evaluate musical content features Structure audio collections Copyright Nov. 2002, George Tzanetakis

Genregram demo Dynamic real-time visualization for classification of radio signals Copyright Nov. 2002, George

Genregram demo Dynamic real-time visualization for classification of radio signals Copyright Nov. 2002, George Tzanetakis

Audio segmentation Detect changes of audio texture Copyright Nov. 2002, George Tzanetakis

Audio segmentation Detect changes of audio texture Copyright Nov. 2002, George Tzanetakis

Multifeature automatic segmenation methodology Time series of feature vector v(t) Detect abrupt changes in

Multifeature automatic segmenation methodology Time series of feature vector v(t) Detect abrupt changes in trajectory Copyright Nov. 2002, George Tzanetakis

Context & Content Aware User Interfaces Automatic results not perfect Music listening is personal

Context & Content Aware User Interfaces Automatic results not perfect Music listening is personal and subjective Browsing vs retrieval “Overview, zoom and filter, details on demand”, Shneiderman mantra Adapt UI to music content and context Computer Audition Visualization Copyright Nov. 2002, George Tzanetakis

Content and Context Content ~ file Genre, male voice, saxophone Content ~ file, collection

Content and Context Content ~ file Genre, male voice, saxophone Content ~ file, collection Similarity Slow-fast Multiple visualizations Same content Different context Copyright Nov. 2002, George Tzanetakis

Timbregrams Content & Context Similarity + Time Structure Principal Component Analysis Map feature vectors

Timbregrams Content & Context Similarity + Time Structure Principal Component Analysis Map feature vectors to color Copyright Nov. 2002, George Tzanetakis

Timbrespaces Copyright Nov. 2002, George Tzanetakis

Timbrespaces Copyright Nov. 2002, George Tzanetakis

Islands of Music Copyright Nov. 2002, George Tzanetakis

Islands of Music Copyright Nov. 2002, George Tzanetakis

Auditory Scene Analysis Copyright Nov. 2002, George Tzanetakis

Auditory Scene Analysis Copyright Nov. 2002, George Tzanetakis