Expectation Maximization EM Northwestern University EECS 395495 Special

Outline • Objective • Simple example • Complex example Most slides from http: //www.

Objective • Learning with missing/unobservable data E B A J 1 1 B E

Objective • Learning with missing/unobservable data E B A J 1 1 ? 1

Simple example Most slides from http: //www. autonlab. org/tutorials/

Maximize likelihood Most slides from http: //www. autonlab. org/tutorials/

Same Problem with Hidden Information Hidden Observable Most slides from http: //www. autonlab. org/tutorials/

Same Problem with Hidden Information G S Most slides from http: //www. autonlab. org/tutorials/

EM for our example Most slides from http: //www. autonlab. org/tutorials/

EM Convergence Most slides from http: //www. autonlab. org/tutorials/

Generalization • X: observable data (score = {h, c, d}) • z: missing data

Gaussian Mixtures Most slides from http: //www. autonlab. org/tutorials/

Gaussian Mixtures • Know – Data – – - • Don’t know – Data

The GMM assumption Most slides from http: //www. autonlab. org/tutorials/

The data generated Label Coordinates Most slides from http: //www. autonlab. org/tutorials/

Computing the likelihood Most slides from http: //www. autonlab. org/tutorials/

EM for GMMs Most slides from http: //www. autonlab. org/tutorials/

Most slides from http: //www. autonlab. org/tutorials/

Generalization • X: observable data • z: unobservable data • : model parameters to

For distributions in exponential family • Exponential family – Yes: normal, exponential, beta, Bernoulli,

What EM really is • X: observable data • Maximize expected log likelihood •

Final comments • Deal with missing data/latent variables • Maximize expected log likelihood •

Slides: 37

Download presentation

Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning Most slides from http: //www. autonlab. org/tutorials/

Outline • Objective • Simple example • Complex example Most slides from http: //www. autonlab. org/tutorials/

Objective • Learning with missing/unobservable data E B A J 1 1 B E 1 0 1 1 0 0 A … Maximum likelihood J Most slides from http: //www. autonlab. org/tutorials/

Objective • Learning with missing/unobservable data E B A J 1 1 ? 1 B E 1 0 ? 1 0 0 ? 0 … A Optimize what? J Most slides from http: //www. autonlab. org/tutorials/

Outline • Objective • Simple example • Complex example Most slides from http: //www. autonlab. org/tutorials/

Simple example Most slides from http: //www. autonlab. org/tutorials/

Maximize likelihood Most slides from http: //www. autonlab. org/tutorials/

Same Problem with Hidden Information Hidden Observable Most slides from http: //www. autonlab. org/tutorials/ Grade Score

Same Problem with Hidden Information G S Most slides from http: //www. autonlab. org/tutorials/

EM for our example Most slides from http: //www. autonlab. org/tutorials/

EM Convergence Most slides from http: //www. autonlab. org/tutorials/

Generalization • X: observable data (score = {h, c, d}) • z: missing data (grade = {a, b, c, d}) • : model parameters to estimate ( ) • E: given , compute the expectation of z • M: use z obtained in E step, maximize the likelihood with respect to Most slides from http: //www. autonlab. org/tutorials/

Outline • Objective • Simple example • Complex example Most slides from http: //www. autonlab. org/tutorials/

Gaussian Mixtures Most slides from http: //www. autonlab. org/tutorials/

Gaussian Mixtures • Know – Data – – - • Don’t know – Data label • Objective – Most slides from http: //www. autonlab. org/tutorials/

The GMM assumption Most slides from http: //www. autonlab. org/tutorials/

The data generated Label Coordinates Most slides from http: //www. autonlab. org/tutorials/

Computing the likelihood Most slides from http: //www. autonlab. org/tutorials/

EM for GMMs Most slides from http: //www. autonlab. org/tutorials/

Most slides from http: //www. autonlab. org/tutorials/

Generalization • X: observable data • z: unobservable data • : model parameters to estimate • E: given , compute the “expectation” of z • M: use z obtained in E step, maximize the likelihood with respect to Most slides from http: //www. autonlab. org/tutorials/

For distributions in exponential family • Exponential family – Yes: normal, exponential, beta, Bernoulli, binomial, multinomial, Poisson… – No: Cauchy and uniform • EM using sufficient statistics – S 1: computing the expectation of the statistics – S 2: set the maximum likelihood Most slides from http: //www. autonlab. org/tutorials/

What EM really is • X: observable data • Maximize expected log likelihood • z: missing data • E-step: Determine the expectation • M-step: Maximize the expectation above with respect to Most slides from http: //www. autonlab. org/tutorials/

Final comments • Deal with missing data/latent variables • Maximize expected log likelihood • Local minima Most slides from http: //www. autonlab. org/tutorials/