Pat. Reco: Expectation Maximization Alexandros Potamianos School of ECE, NTUA Fall 2014 -2015
When do we use EM? w Partially observable data n n n Missing some features from some samples, e. g. , D={(1, 2), (2, 3), (? , 4)} Missing class labels, e. g. , hidden states of HMMs Missing class sub-labels, e. g. , mixture label for mixture of Gaussian models
The EM algorithm w The Expectation Maximization algorithm (EM) consists of alternating expectation and maximization steps w During expectation steps the “best estimates of the missing information” are computed w During maximization step maximum likelihood training on all data is performed
EM Initialization: (0) for i =1. . iterno // usually iterno=2 or 3 E step: Q(i) = EDbad{log(p(D; )|x; Dgood, (i-1)} M step: (i) =argmax{Q(i)} end
Pseudo-EM Initialization: (0) for i =1. . iterno Expectation step: // usually iterno=2 or 3 Dbad=E{Dgood | (i-1)} Maximization step: (i) =argmax{p(D| (i-1)} end
Convergence w EM is guaranteed to converge to a local optimum (NOT the global optimum!) w Pseudo-EM has no convergence guarantees but is used often in practice
Conclusions w EM is an iterative algorithm used when there are missing or partially observable training data w EM is a generalization of ML training w EM is guaranteed to converge to a local optimum (NOT the global optimum!)