Machine Learning for Signal Processing Regression and Prediction

  • Slides: 89
Download presentation
Machine Learning for Signal Processing Regression and Prediction Class 14. 17 Oct 2012 Instructor:

Machine Learning for Signal Processing Regression and Prediction Class 14. 17 Oct 2012 Instructor: Bhiksha Raj 17 Oct 2013 11755/18797 1

Matrix Identities • The derivative of a scalar function w. r. t. a vector

Matrix Identities • The derivative of a scalar function w. r. t. a vector is a vector 17 Oct 2013 11755/18797 2

Matrix Identities • The derivative of a scalar function w. r. t. a vector

Matrix Identities • The derivative of a scalar function w. r. t. a vector is a vector • The derivative w. r. t. a matrix is a matrix 17 Oct 2013 11755/18797 3

Matrix Identities • The derivative of a vector function w. r. t. a vector

Matrix Identities • The derivative of a vector function w. r. t. a vector is a matrix – Note transposition of order 17 Oct 2013 11755/18797 4

Derivatives , , Ux. V Nx 1 Ux. Vx. N • In general: Differentiating

Derivatives , , Ux. V Nx 1 Ux. Vx. N • In general: Differentiating an Mx. N function by a Ux. V argument results in an Mx. Nx. Ux. V tensor derivative 17 Oct 2013 11755/18797 5

Matrix derivative identities X is a matrix, a is a vector. Solution may also

Matrix derivative identities X is a matrix, a is a vector. Solution may also be XT A is a matrix • Some basic linear and quadratic identities 17 Oct 2013 11755/18797 6

A Common Problem • Can you spot the glitches? 17 Oct 2013 11755/18797 7

A Common Problem • Can you spot the glitches? 17 Oct 2013 11755/18797 7

How to fix this problem? • “Glitches” in audio – Must be detected –

How to fix this problem? • “Glitches” in audio – Must be detected – How? • Then what? • Glitches must be “fixed” – Delete the glitch • Results in a “hole” – Fill in the hole – How? 17 Oct 2013 11755/18797 8

Interpolation. . • “Extend” the curve on the left to “predict” the values in

Interpolation. . • “Extend” the curve on the left to “predict” the values in the “blank” region – Forward prediction • Extend the blue curve on the right leftwards to predict the blank region – Backward prediction • How? – Regression analysis. . 17 Oct 2013 11755/18797 9

Detecting the Glitch OK NOT OK • Regression-based reconstruction can be done anywhere •

Detecting the Glitch OK NOT OK • Regression-based reconstruction can be done anywhere • Reconstructed value will not match actual value • Large error of reconstruction identifies glitches 17 Oct 2013 11755/18797 10

What is a regression • Analyzing relationship between variables • Expressed in many forms

What is a regression • Analyzing relationship between variables • Expressed in many forms • Wikipedia – Linear regression, Simple regression, Ordinary least squares, Polynomial regression, General linear model, Generalized linear model, Discrete choice, Logistic regression, Multinomial logit, Mixed logit, Probit, Multinomial probit, …. • Generally a tool to predict variables 17 Oct 2013 11755/18797 11

Regressions for prediction • y = f(x; Q) + e • Different possibilities –

Regressions for prediction • y = f(x; Q) + e • Different possibilities – y is a scalar • y is real • y is categorical (classification) – y is a vector – x is a vector • x is a set of real valued variables • x is a set of categorical variables • x is a combination of the two – f(. ) is a linear or affine function – f(. ) is a non-linear function – f(. ) is a time-series model 17 Oct 2013 11755/18797 12

A linear regression Y X • Assumption: relationship between variables is linear – –

A linear regression Y X • Assumption: relationship between variables is linear – – A linear trend may be found relating x and y y = dependent variable x = explanatory variable Given x, y can be predicted as an affine function of x 17 Oct 2013 11755/18797 13

An imaginary regression. . • • http: //pages. cs. wisc. edu/~kovar/hall. html Check this

An imaginary regression. . • • http: //pages. cs. wisc. edu/~kovar/hall. html Check this shit out (Fig. 1). That's bonafide, 100%-real data, my friends. I took it myself over the course of two weeks. And this was not a leisurely two weeks, either; I busted my ass day and night in order to provide you with nothing but the best data possible. Now, let's look a bit more closely at this data, remembering that it is absolutely first-rate. Do you see the exponential dependence? I sure don't. I see a bunch of crap. Christ, this was such a waste of my time. Banking on my hopes that whoever grades this will just look at the pictures, I drew an exponential through my noise. I believe the apparent legitimacy is enhanced by the fact that I used a complicated computer program to make the fit. I understand this is the same process by which the top quark was discovered. 17 Oct 2013 11755/18797 14

Linear Regressions • y = Ax + b + e – e = prediction

Linear Regressions • y = Ax + b + e – e = prediction error • Given a “training” set of {x, y} values: estimate A and b – y 1 = Ax 1 + b + e 1 – y 2 = Ax 2 + b + e 2 – y 3 = Ax 3 + b+ e 3 –… • If A and b are well estimated, prediction error will be small 17 Oct 2013 11755/18797 15

Linear Regression to a scalar y 1 = a. Tx 1 + b +

Linear Regression to a scalar y 1 = a. Tx 1 + b + e 1 y 2 = a. Tx 2 + b + e 2 y 3 = a. Tx 3 + b + e 3 n Define: • Rewrite 17 Oct 2013 11755/18797 16

Learning the parameters Assuming no error • Given training data: several x, y •

Learning the parameters Assuming no error • Given training data: several x, y • Can define a “divergence”: D(y, ) – Measures how much differs from y – Ideally, if the model is accurate this should be small • Estimate A, b to minimize D(y, ) 17 Oct 2013 11755/18797 17

The prediction error as divergence y 1 = a. Tx 1 + b +

The prediction error as divergence y 1 = a. Tx 1 + b + e 1 y 2 = a. Tx 2 + b + e 2 y 3 = a. Tx 3 + b + e 3 • Define divergence as sum of the squared error in predicting y 17 Oct 2013 11755/18797 18

Prediction error as divergence • y = a. Tx + e – e =

Prediction error as divergence • y = a. Tx + e – e = prediction error – Find the “slope” a such that the total squared length of the error lines is minimized 17 Oct 2013 11755/18797 19

Solving a linear regression • Minimize squared error • Differentiating w. r. t A

Solving a linear regression • Minimize squared error • Differentiating w. r. t A and equating to 0 17 Oct 2013 11755/18797 20

Regression in multiple dimensions y 1 = ATx 1 + b + e 1

Regression in multiple dimensions y 1 = ATx 1 + b + e 1 y 2 = ATx 2 + b + e 2 y 3 = ATx 3 + b + e 3 yi is a vector yij = jth component of vector yi ai = ith column of A • Also called multiple regression bj = jth component of • Equivalent of saying: yi 1 = a 1 Txi + b 1 + ei 1 yi 2 = a 2 Txi + b 2 + ei 2 yi 3 = a 3 Txi + b 3 + ei 3 yi = ATxi + b + ei • Fundamentally no different from N separate single regressions – But we can use the relationship between ys to our benefit 17 Oct 2013 11755/18797 21 b

Multiple Regression Dx 1 vector of ones • Differentiating and equating to 0 17

Multiple Regression Dx 1 vector of ones • Differentiating and equating to 0 17 Oct 2013 11755/18797 22

A Different Perspective = + • y is a noisy reading of ATx •

A Different Perspective = + • y is a noisy reading of ATx • Error e is Gaussian • Estimate A from 17 Oct 2013 11755/18797 23

The Likelihood of the data • Probability of observing a specific y, given x,

The Likelihood of the data • Probability of observing a specific y, given x, for a particular matrix A • Probability of collection: • Assuming IID for convenience (not necessary) 17 Oct 2013 11755/18797 24

A Maximum Likelihood Estimate • Maximizing the log probability is identical to minimizing the

A Maximum Likelihood Estimate • Maximizing the log probability is identical to minimizing the trace – Identical to the least squares solution 17 Oct 2013 11755/18797 25

Predicting an output • From a collection of training data, have learned A •

Predicting an output • From a collection of training data, have learned A • Given x for a new instance, but not y, what is y? • Simple solution: 17 Oct 2013 11755/18797 26

Applying it to our problem • Prediction by regression • Forward regression • xt

Applying it to our problem • Prediction by regression • Forward regression • xt = a 1 xt-1+ a 2 xt-2…akxt-k+et • Backward regression • xt = b 1 xt+1+ b 2 xt+2…bkxt+k +et 17 Oct 2013 11755/18797 27

Applying it to our problem • Forward prediction 17 Oct 2013 11755/18797 28

Applying it to our problem • Forward prediction 17 Oct 2013 11755/18797 28

Applying it to our problem • Backward prediction 17 Oct 2013 11755/18797 29

Applying it to our problem • Backward prediction 17 Oct 2013 11755/18797 29

Finding the burst • At each time – Learn a “forward” predictor at –

Finding the burst • At each time – Learn a “forward” predictor at – At each time, predict next sample xtest = Si at, kxt-k – Compute error: ferrt=|xt-xtest |2 – Learn a “backward” predict and compute backward error • berrt – Compute average prediction error over window, threshold 17 Oct 2013 11755/18797 30

Filling the hole • Learn “forward” predictor at left edge of “hole” – For

Filling the hole • Learn “forward” predictor at left edge of “hole” – For each missing sample – At each time, predict next sample xtest = Si at, kxt-k • Use estimated samples if real samples are not available • Learn “backward” predictor at left edge of “hole” – For each missing sample – At each time, predict next sample xtest = Si bt, kxt+k • Use estimated samples if real samples are not available • Average forward and backward predictions 17 Oct 2013 11755/18797 31

Reconstruction zoom in Reconstruction area Next glitch Distorted signal Recovered signal Interpolation result 17

Reconstruction zoom in Reconstruction area Next glitch Distorted signal Recovered signal Interpolation result 17 Oct 2013 Actual data 11755/18797 32

Incrementally learning the regression Requires knowledge of all (x, y) pairs • Can we

Incrementally learning the regression Requires knowledge of all (x, y) pairs • Can we learn A incrementally instead? – As data comes in? • The Widrow Hoff rule • Note the structure Scalar prediction version error – Can also be done in batch mode! 17 Oct 2013 11755/18797 33

Predicting a value • What are we doing exactly? – For the explanation we

Predicting a value • What are we doing exactly? – For the explanation we are assuming no “b” (X is 0 mean) – Explanation generalizes easily even otherwise n Let and n n 17 Oct 2013 Whitening x N-0. 5 C-0. 5 is the whitening matrix for x 11755/18797 34

Predicting a value • What are we doing exactly? 17 Oct 2013 11755/18797 35

Predicting a value • What are we doing exactly? 17 Oct 2013 11755/18797 35

Predicting a value • Given training instances (xi, yi) for i = 1. .

Predicting a value • Given training instances (xi, yi) for i = 1. . N, estimate y for a new test instance of x with unknown y : • y is simply a weighted sum of the yi instances from the training data • The weight of any yi is simply the inner product between its corresponding xi and the new x – With due whitening and scaling. . 17 Oct 2013 11755/18797 36

What are we doing: A different perspective • Assumes XXT is invertible • What

What are we doing: A different perspective • Assumes XXT is invertible • What if it is not – Dimensionality of X is greater than number of observations? – Underdetermined • In this case XTX will generally be invertible 17 Oct 2013 11755/18797 37

High-dimensional regression • XTX is the “Gram Matrix” 17 Oct 2013 11755/18797 38

High-dimensional regression • XTX is the “Gram Matrix” 17 Oct 2013 11755/18797 38

High-dimensional regression • Normalize Y by the inverse of the gram matrix • Working

High-dimensional regression • Normalize Y by the inverse of the gram matrix • Working our way down. . 17 Oct 2013 11755/18797 39

Linear Regression in High-dimensional Spaces • Given training instances (xi, yi) for i =

Linear Regression in High-dimensional Spaces • Given training instances (xi, yi) for i = 1. . N, estimate y for a new test instance of x with unknown y : • y is simply a weighted sum of the normalized yi instances from the training data – The normalization is done via the Gram Matrix • The weight of any yi is simply the inner product between its corresponding xi and the new x 17 Oct 2013 11755/18797 40

Relationships are not always linear • How do we model these? • Multiple solutions

Relationships are not always linear • How do we model these? • Multiple solutions 17 Oct 2013 11755/18797 41

Non-linear regression • y = Aj(x)+e n n Y = AF(X)+e Replace X with

Non-linear regression • y = Aj(x)+e n n Y = AF(X)+e Replace X with F(X) in earlier equations for solution 17 Oct 2013 11755/18797 42

Problem n n Y = AF(X)+e Replace X with F(X) in earlier equations for

Problem n n Y = AF(X)+e Replace X with F(X) in earlier equations for solution F(X) may be in a very high-dimensional space The high-dimensional space (or the transform F(X)) may be unknown. . 17 Oct 2013 11755/18797 43

The regression is in high dimensions • Linear regression: • High-dimensional regression 17 Oct

The regression is in high dimensions • Linear regression: • High-dimensional regression 17 Oct 2013 11755/18797 44

Doing it with Kernels • High-dimensional regression with Kernels: • Regression in Kernel Hilbert

Doing it with Kernels • High-dimensional regression with Kernels: • Regression in Kernel Hilbert Space. . 17 Oct 2013 11755/18797 45

A different way of finding nonlinear relationships: Locally linear regression • Previous discussion: Regression

A different way of finding nonlinear relationships: Locally linear regression • Previous discussion: Regression parameters are optimized over the entire training set • Minimize • Single global regression is estimated and applied to all future x • Alternative: Local regression • Learn a regression that is specific to x 17 Oct 2013 11755/18797 46

Being non-committal: Local Regression • Estimate the regression to be applied to any x

Being non-committal: Local Regression • Estimate the regression to be applied to any x using training instances near x • The resultant regression has the form – Note : this regression is specific to x • A separate regression must be learned for every x 17 Oct 2013 11755/18797 47

Local Regression • But what is d()? – For linear regression d() is an

Local Regression • But what is d()? – For linear regression d() is an inner product • More generic form: Choose d() as a function of the distance between x and xj • If d() falls off rapidly with |x and xj| the “neighbhorhood” requirement can be relaxed 17 Oct 2013 11755/18797 48

Kernel Regression: d() = K() • Typical Kernel functions: Gaussian, Laplacian, other density functions

Kernel Regression: d() = K() • Typical Kernel functions: Gaussian, Laplacian, other density functions – Must fall off rapidly with increasing distance between x and xj • Regression is local to every x : Local regression • Actually a non-parametric MAP estimator of y – But first. . MAP estimators. . 49

Map Estimators • MAP (Maximum A Posteriori): Find a “best guess” for y (statistically),

Map Estimators • MAP (Maximum A Posteriori): Find a “best guess” for y (statistically), given known x y = argmax Y P(Y|x) • ML (Maximum Likelihood): Find that value of y for which the statistical best guess of x would have been the observed x y = argmax Y P(x|Y) • MAP is simpler to visualize 17 Oct 2013 11755/18797 50

MAP estimation: Gaussian PDF Assume X and Y are jointly Gaussian The parameters of

MAP estimation: Gaussian PDF Assume X and Y are jointly Gaussian The parameters of the F 1 Y Gaussian are learned from training data 17 Oct 2013 F 0 X 11755/18797 51

Learning the parameters of the Gaussian 17 Oct 2013 11755/18797 52

Learning the parameters of the Gaussian 17 Oct 2013 11755/18797 52

Learning the parameters of the Gaussian 17 Oct 2013 11755/18797 53

Learning the parameters of the Gaussian 17 Oct 2013 11755/18797 53

MAP estimation: Gaussian PDF Assume X and Y are jointly Gaussian The parameters of

MAP estimation: Gaussian PDF Assume X and Y are jointly Gaussian The parameters of the F 1 Y Gaussian are learned from training data 17 Oct 2013 F 0 X 11755/18797 54

MAP Estimator for Gaussian RV Assume X and Y are jointly Gaussian Level set

MAP Estimator for Gaussian RV Assume X and Y are jointly Gaussian Level set of Gaussian The parameters of the Gaussian are learned from training data X 0 Now we are given an X, but no Y What is Y? 11755/18797 55

MAP estimator for Gaussian RV x 0 17 Oct 2013 11755/18797 56

MAP estimator for Gaussian RV x 0 17 Oct 2013 11755/18797 56

MAP estimation: Gaussian PDF F 1 Y 17 Oct 2013 X 11755/18797 57

MAP estimation: Gaussian PDF F 1 Y 17 Oct 2013 X 11755/18797 57

MAP estimation: The Gaussian at a particular value of X F 1 17 Oct

MAP estimation: The Gaussian at a particular value of X F 1 17 Oct 2013 x 0 11755/18797 58

MAP estimation: The Gaussian at a particular value of X Most likely value F

MAP estimation: The Gaussian at a particular value of X Most likely value F 1 17 Oct 2013 x 0 11755/18797 59

MAP Estimation of a Gaussian RV Y = argmaxy P(y| X) ? ? ?

MAP Estimation of a Gaussian RV Y = argmaxy P(y| X) ? ? ? x 0 17 Oct 2013 11755/18797 60

MAP Estimation of a Gaussian RV x 0 17 Oct 2013 11755/18797 61

MAP Estimation of a Gaussian RV x 0 17 Oct 2013 11755/18797 61

MAP Estimation of a Gaussian RV Y = argmaxy P(y| X) x 0 17

MAP Estimation of a Gaussian RV Y = argmaxy P(y| X) x 0 17 Oct 2013 11755/18797 62

So what is this value? • Clearly a line • Equation of Line: •

So what is this value? • Clearly a line • Equation of Line: • Scalar version given; vector version is identical • Derivation? Later in the program a bit – Note the similarity to regression 17 Oct 2013 11755/18797 63

This is a multiple regression • This is the MAP estimate of y –

This is a multiple regression • This is the MAP estimate of y – y = argmax Y P(Y|x) • What about the ML estimate of y – argmax Y P(x|Y) • Note: Neither of these may be the regression line! – MAP estimation of y is the regression on Y for Gaussian RVs – But this is not the MAP estimation of the regression parameter 17 Oct 2013 11755/18797 64

Its also a minimum-mean-squared error estimate • General principle of MMSE estimation: – y

Its also a minimum-mean-squared error estimate • General principle of MMSE estimation: – y is unknown, x is known – Must estimate it such that the expected squared error is minimized – Minimize above term 17 Oct 2013 11755/18797 65

Its also a minimum-mean-squared error estimate • Minimize error: • Differentiating and equating to

Its also a minimum-mean-squared error estimate • Minimize error: • Differentiating and equating to 0: The MMSE estimate is the mean of the distribution 17 Oct 2013 11755/18797 66

For the Gaussian: MAP = MMSE Most likely value is also The MEAN value

For the Gaussian: MAP = MMSE Most likely value is also The MEAN value § Would be true of any symmetric distribution 17 Oct 2013 11755/18797 67

MMSE estimates for mixture distributions n n n Let P(y|x) be a mixture density

MMSE estimates for mixture distributions n n n Let P(y|x) be a mixture density The MMSE estimate of y is given by Just a weighted combination of the MMSE estimates from the component distributions 68

MMSE estimates from a Gaussian mixture n Let P(x, y) be a Gaussian Mixture

MMSE estimates from a Gaussian mixture n Let P(x, y) be a Gaussian Mixture n P(y|x) is also a Gaussian mixture 17 Oct 2013 11755/18797 69

MMSE estimates from a Gaussian mixture n Let P(y|x) is a Gaussian Mixture 17

MMSE estimates from a Gaussian mixture n Let P(y|x) is a Gaussian Mixture 17 Oct 2013 11755/18797 70

MMSE estimates from a Gaussian mixture n P(y|x) is a mixture Gaussian density n

MMSE estimates from a Gaussian mixture n P(y|x) is a mixture Gaussian density n E[y|x] is also a mixture 17 Oct 2013 11755/18797 71

MMSE estimates from a Gaussian mixture n A mixture of estimates from individual Gaussians

MMSE estimates from a Gaussian mixture n A mixture of estimates from individual Gaussians 17 Oct 2013 11755/18797 72

Voice Morphing • Align training recordings from both speakers – Cepstral vector sequence •

Voice Morphing • Align training recordings from both speakers – Cepstral vector sequence • Learn a GMM on joint vectors • Given speech from one speaker, find MMSE estimate of the other • Synthesize from cepstra 17 Oct 2013 11755/18797 73

MMSE with GMM: Voice Transformation - Festvox GMM transformation suite (Toda) awb bdl jmk

MMSE with GMM: Voice Transformation - Festvox GMM transformation suite (Toda) awb bdl jmk slt 17 Oct 2013 11755/18797 74

A problem with regressions • ML fit is sensitive – Error is squared –

A problem with regressions • ML fit is sensitive – Error is squared – Small variations in data large variations in weights – Outliers affect it adversely • Unstable – If dimension of X >= no. of instances • (XXT) is not invertible 17 Oct 2013 11755/18797 75

MAP estimation of weights a y=a. TX+e X e • Assume weights drawn from

MAP estimation of weights a y=a. TX+e X e • Assume weights drawn from a Gaussian – P(a) = N(0, s 2 I) • Max. Likelihood estimate • Maximum a posteriori estimate 17 Oct 2013 11755/18797 76

MAP estimation of weights q q P(a) = N(0, s 2 I) Log P(a)

MAP estimation of weights q q P(a) = N(0, s 2 I) Log P(a) = C – log s – 0. 5 s-2 ||a||2 • Similar to ML estimate with an additional term 17 Oct 2013 11755/18797 77

MAP estimate of weights • Equivalent to diagonal loading of correlation matrix – Improves

MAP estimate of weights • Equivalent to diagonal loading of correlation matrix – Improves condition number of correlation matrix • Can be inverted with greater stability – Will not affect the estimation from well-conditioned data – Also called Tikhonov Regularization • Dual form: Ridge regression • MAP estimate of weights – Not to be confused with MAP estimate of Y 17 Oct 2013 11755/18797 78

MAP estimate priors • Left: Gaussian Prior on W • Right: Laplacian Prior 17

MAP estimate priors • Left: Gaussian Prior on W • Right: Laplacian Prior 17 Oct 2013 11755/18797 79

MAP estimation of weights with laplacian prior • Assume weights drawn from a Laplacian

MAP estimation of weights with laplacian prior • Assume weights drawn from a Laplacian – P(a) = l-1 exp(-l-1|a|1) • Maximum a posteriori estimate • No closed form solution – Quadratic programming solution required • Non-trivial 17 Oct 2013 11755/18797 80

MAP estimation of weights with laplacian prior • Assume weights drawn from a Laplacian

MAP estimation of weights with laplacian prior • Assume weights drawn from a Laplacian – P(a) = l-1 exp(-l-1|a|1) • Maximum a posteriori estimate –… • Identical to L 1 regularized least-squares estimation 17 Oct 2013 11755/18797 81

L 1 -regularized LSE • No closed form solution – Quadratic programming solutions required

L 1 -regularized LSE • No closed form solution – Quadratic programming solutions required • Dual formulation subject to • “LASSO” – Least absolute shrinkage and selection operator 17 Oct 2013 11755/18797 82

LASSO Algorithms • Various convex optimization algorithms • LARS: Least angle regression • Pathwise

LASSO Algorithms • Various convex optimization algorithms • LARS: Least angle regression • Pathwise coordinate descent. . • Matlab code available from web 17 Oct 2013 11755/18797 83

Regularized least squares Image Credit: Tibshirani • Regularization results in selection of suboptimal (in

Regularized least squares Image Credit: Tibshirani • Regularization results in selection of suboptimal (in least-squares sense) solution – One of the loci outside center • Tikhonov regularization selects shortest solution • L 1 regularization selects sparsest solution 17 Oct 2013 11755/18797 84

LASSO and Compressive Sensing Y = X a • Given Y and X, estimate

LASSO and Compressive Sensing Y = X a • Given Y and X, estimate sparse W • LASSO: – X = explanatory variable – Y = dependent variable – a = weights of regression • CS: – X = measurement matrix – Y = measurement – a = data 17 Oct 2013 11755/18797 85

An interesting problem: Predicting War! • Economists measure a number of social indicators for

An interesting problem: Predicting War! • Economists measure a number of social indicators for countries weekly – Happiness index – Hunger index – Freedom index – Twitter records –… • Question: Will there be a revolution or war next week? 17 Oct 2013 11755/18797 86

An interesting problem: Predicting War! • Issues: – Dissatisfaction builds up – not an

An interesting problem: Predicting War! • Issues: – Dissatisfaction builds up – not an instantaneous phenomenon • Usually – War / rebellion build up much faster • Often in hours • Important to predict – Preparedness for security – Economic impact 17 Oct 2013 11755/18797 87

Predicting War O 1 O 2 O 3 O 4 W W S Given

Predicting War O 1 O 2 O 3 O 4 W W S Given S S S O 5 O 6 O 7 O 8 W W S S wk 1 wk 2 wk 3 wk 4 wk 5 wk 6 wk 7 wk 8 – Sequence of economic indicators for each week – Sequence of unrest markers for each week • At the end of each week we know if war happened or not that week • Predict probability of unrest next week – This could be a new unrest or persistence of a current one 17 Oct 2013 11755/18797 88

Predicting Time Series • Need time-series models • HMMs – later in the course

Predicting Time Series • Need time-series models • HMMs – later in the course 17 Oct 2013 11755/18797 89