Structure Discovery of Pop Music Using HHMM E

  • Slides: 9
Download presentation
Structure Discovery of Pop Music Using HHMM E 6820 Project Jessie Hsu 03/09/05

Structure Discovery of Pop Music Using HHMM E 6820 Project Jessie Hsu 03/09/05

Problem Description l Given l l Wav signal of a pop song Discover the

Problem Description l Given l l Wav signal of a pop song Discover the structure of the song l l l Intro Verse Chorus Bridge Outro

HMM Framework l Model the music signal as a series of state transitions Hidden

HMM Framework l Model the music signal as a series of state transitions Hidden States …… Observations ……

HMM Framework: Hierarchical HMM l Each observation is an audio frame of one beat

HMM Framework: Hierarchical HMM l Each observation is an audio frame of one beat length Hidden States at Structure Level Intro Verse Outro Hidden States at Frame Level …… Observations ……

Representing a HHMM l HHMM parameters l l l Prior of each state at

Representing a HHMM l HHMM parameters l l l Prior of each state at structure level and frame level π State transition probabilities at structure level and frame level α Emission parameters for each state at both levels l l Each state is modeled as a mixture of Gaussians Mean μ and covariance matrices Σ of each Gaussian

Training a HHMM l EM for HHMM l l Look for maximum likelihood state

Training a HHMM l EM for HHMM l l Look for maximum likelihood state sequence and model parameters M-step: Best state sequence l l l Backward-forward algorithm Viterbi algorithm E-step: Parameter estimation l l l Priors at both levels π State transition probabilities α Emission parameters - Gaussian mixture mean μ and covariance matrices Σ

Preprocessing l Beat detection l l Segment the music into beat-length frames Feature extraction

Preprocessing l Beat detection l l Segment the music into beat-length frames Feature extraction l l Repetition related feature (chorus/nonchorus) – Chroma vector Intensity related feature (vocal/nonvocal) - Subband based Log Frequency Power Coefficients Pitch related features – narrowband spectrogram features (Hann windowed FFT coefficients) And possibly more…. under investigation

Tasks l HHMM on a test song l l l Songs with I-V 1

Tasks l HHMM on a test song l l l Songs with I-V 1 -C 1 -V 2 -C 2 -(V 3 -C 3)-B-O structure Manually label structures as ground truth Predefine the number of states at both structure and frame levels Preprocessing Model fitting Evaluation l l Accuracy of structure identification Accuracy of structure timing

Reference l l l l Y. Wang, M. -Y. Kan, T. L. New, A.

Reference l l l l Y. Wang, M. -Y. Kan, T. L. New, A. Shenoy, J. Yin, “Lyric. Ally: Automatic Synchronization of Acoustical Musical Signals and Textual Lyrics”, ACM MM 2004 C. Raphael, “A Hybrid Graphical Model For Aligning Polyphonic Audio With Musical Scores”, ISMIR 2004 C. Raphael, “Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models”, IEEE Trans on PAMI, 1999 P. J. Walmsley, S. J. Godsill, P. J. W. Rayner, “Polyphonic Pitch Tracking Using Joint Bayesian Estimation of Multiple Frame Parameters”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1999 L. Xie, S. -F. Chang, A. Divakaran, H. Sun, “Learning Hierarchical Hidden Markov Models for Video Structure Discovery”, Tech Report, Columbia Univ, 2002 L. Xie, S. -F. Chang, A. Divakaran, H. Sun, “Unsupervised Mining of Statistical Temporal Structures in Video”, Video Mining, Ch 10, Kluwer Academic Publishers, 2003 R. J. Turetsky, D. P. W. Ellis, “Ground-Truth Transcriptions of Real Music from Force-Aligned MIDI Synthesis”, ISMIR 2003