Machine Learning for Signal Processing Regression and Prediction

Matrix Identities • The derivative of a scalar function w. r. t. a vector

Matrix Identities • The derivative of a vector function w. r. t. a vector

Derivatives , , Ux. V Nx 1 Ux. Vx. N • In general: Differentiating

Matrix derivative identities X is a matrix, a is a vector. Solution may also

A Common Problem • Can you spot the glitches? 17 Oct 2013 11755/18797 7

How to fix this problem? • “Glitches” in audio – Must be detected –

Interpolation. . • “Extend” the curve on the left to “predict” the values in

Detecting the Glitch OK NOT OK • Regression-based reconstruction can be done anywhere •

What is a regression • Analyzing relationship between variables • Expressed in many forms

Regressions for prediction • y = f(x; Q) + e • Different possibilities –

A linear regression Y X • Assumption: relationship between variables is linear – –

An imaginary regression. . • • http: //pages. cs. wisc. edu/~kovar/hall. html Check this

Linear Regressions • y = Ax + b + e – e = prediction

Linear Regression to a scalar y 1 = a. Tx 1 + b +

Learning the parameters Assuming no error • Given training data: several x, y •

The prediction error as divergence y 1 = a. Tx 1 + b +

Prediction error as divergence • y = a. Tx + e – e =

Solving a linear regression • Minimize squared error • Differentiating w. r. t A

Regression in multiple dimensions y 1 = ATx 1 + b + e 1

Multiple Regression Dx 1 vector of ones • Differentiating and equating to 0 17

A Different Perspective = + • y is a noisy reading of ATx •

The Likelihood of the data • Probability of observing a specific y, given x,

A Maximum Likelihood Estimate • Maximizing the log probability is identical to minimizing the

Predicting an output • From a collection of training data, have learned A •

Applying it to our problem • Prediction by regression • Forward regression • xt

Applying it to our problem • Forward prediction 17 Oct 2013 11755/18797 28

Applying it to our problem • Backward prediction 17 Oct 2013 11755/18797 29

Finding the burst • At each time – Learn a “forward” predictor at –

Filling the hole • Learn “forward” predictor at left edge of “hole” – For

Reconstruction zoom in Reconstruction area Next glitch Distorted signal Recovered signal Interpolation result 17

Incrementally learning the regression Requires knowledge of all (x, y) pairs • Can we

Predicting a value • What are we doing exactly? – For the explanation we

Predicting a value • What are we doing exactly? 17 Oct 2013 11755/18797 35

Predicting a value • Given training instances (xi, yi) for i = 1. .

What are we doing: A different perspective • Assumes XXT is invertible • What

High-dimensional regression • XTX is the “Gram Matrix” 17 Oct 2013 11755/18797 38

High-dimensional regression • Normalize Y by the inverse of the gram matrix • Working

Linear Regression in High-dimensional Spaces • Given training instances (xi, yi) for i =

Relationships are not always linear • How do we model these? • Multiple solutions

Non-linear regression • y = Aj(x)+e n n Y = AF(X)+e Replace X with

Problem n n Y = AF(X)+e Replace X with F(X) in earlier equations for

The regression is in high dimensions • Linear regression: • High-dimensional regression 17 Oct

Doing it with Kernels • High-dimensional regression with Kernels: • Regression in Kernel Hilbert

A different way of finding nonlinear relationships: Locally linear regression • Previous discussion: Regression

Being non-committal: Local Regression • Estimate the regression to be applied to any x

Local Regression • But what is d()? – For linear regression d() is an

Kernel Regression: d() = K() • Typical Kernel functions: Gaussian, Laplacian, other density functions

Map Estimators • MAP (Maximum A Posteriori): Find a “best guess” for y (statistically),

MAP estimation: Gaussian PDF Assume X and Y are jointly Gaussian The parameters of

Learning the parameters of the Gaussian 17 Oct 2013 11755/18797 52

Learning the parameters of the Gaussian 17 Oct 2013 11755/18797 53

MAP Estimator for Gaussian RV Assume X and Y are jointly Gaussian Level set

MAP estimator for Gaussian RV x 0 17 Oct 2013 11755/18797 56

MAP estimation: Gaussian PDF F 1 Y 17 Oct 2013 X 11755/18797 57

MAP estimation: The Gaussian at a particular value of X F 1 17 Oct

MAP estimation: The Gaussian at a particular value of X Most likely value F

MAP Estimation of a Gaussian RV Y = argmaxy P(y| X) ? ? ?

MAP Estimation of a Gaussian RV x 0 17 Oct 2013 11755/18797 61

MAP Estimation of a Gaussian RV Y = argmaxy P(y| X) x 0 17

So what is this value? • Clearly a line • Equation of Line: •

This is a multiple regression • This is the MAP estimate of y –

Its also a minimum-mean-squared error estimate • General principle of MMSE estimation: – y

Its also a minimum-mean-squared error estimate • Minimize error: • Differentiating and equating to

For the Gaussian: MAP = MMSE Most likely value is also The MEAN value

MMSE estimates for mixture distributions n n n Let P(y|x) be a mixture density

MMSE estimates from a Gaussian mixture n Let P(x, y) be a Gaussian Mixture

MMSE estimates from a Gaussian mixture n Let P(y|x) is a Gaussian Mixture 17

MMSE estimates from a Gaussian mixture n P(y|x) is a mixture Gaussian density n

MMSE estimates from a Gaussian mixture n A mixture of estimates from individual Gaussians

Voice Morphing • Align training recordings from both speakers – Cepstral vector sequence •

MMSE with GMM: Voice Transformation - Festvox GMM transformation suite (Toda) awb bdl jmk

A problem with regressions • ML fit is sensitive – Error is squared –

MAP estimation of weights a y=a. TX+e X e • Assume weights drawn from

MAP estimation of weights q q P(a) = N(0, s 2 I) Log P(a)

MAP estimate of weights • Equivalent to diagonal loading of correlation matrix – Improves

MAP estimate priors • Left: Gaussian Prior on W • Right: Laplacian Prior 17

MAP estimation of weights with laplacian prior • Assume weights drawn from a Laplacian

L 1 -regularized LSE • No closed form solution – Quadratic programming solutions required

LASSO Algorithms • Various convex optimization algorithms • LARS: Least angle regression • Pathwise

Regularized least squares Image Credit: Tibshirani • Regularization results in selection of suboptimal (in

LASSO and Compressive Sensing Y = X a • Given Y and X, estimate

An interesting problem: Predicting War! • Economists measure a number of social indicators for

An interesting problem: Predicting War! • Issues: – Dissatisfaction builds up – not an

Predicting War O 1 O 2 O 3 O 4 W W S Given

Predicting Time Series • Need time-series models • HMMs – later in the course

Slides: 89

Download presentation

Machine Learning for Signal Processing Regression and Prediction Class 14. 17 Oct 2012 Instructor: Bhiksha Raj 17 Oct 2013 11755/18797 1

Matrix Identities • The derivative of a scalar function w. r. t. a vector is a vector 17 Oct 2013 11755/18797 2

Matrix Identities • The derivative of a scalar function w. r. t. a vector is a vector • The derivative w. r. t. a matrix is a matrix 17 Oct 2013 11755/18797 3

Matrix Identities • The derivative of a vector function w. r. t. a vector is a matrix – Note transposition of order 17 Oct 2013 11755/18797 4

Derivatives , , Ux. V Nx 1 Ux. Vx. N • In general: Differentiating an Mx. N function by a Ux. V argument results in an Mx. Nx. Ux. V tensor derivative 17 Oct 2013 11755/18797 5

Matrix derivative identities X is a matrix, a is a vector. Solution may also be XT A is a matrix • Some basic linear and quadratic identities 17 Oct 2013 11755/18797 6

A Common Problem • Can you spot the glitches? 17 Oct 2013 11755/18797 7

How to fix this problem? • “Glitches” in audio – Must be detected – How? • Then what? • Glitches must be “fixed” – Delete the glitch • Results in a “hole” – Fill in the hole – How? 17 Oct 2013 11755/18797 8

Interpolation. . • “Extend” the curve on the left to “predict” the values in the “blank” region – Forward prediction • Extend the blue curve on the right leftwards to predict the blank region – Backward prediction • How? – Regression analysis. . 17 Oct 2013 11755/18797 9

Detecting the Glitch OK NOT OK • Regression-based reconstruction can be done anywhere • Reconstructed value will not match actual value • Large error of reconstruction identifies glitches 17 Oct 2013 11755/18797 10

What is a regression • Analyzing relationship between variables • Expressed in many forms • Wikipedia – Linear regression, Simple regression, Ordinary least squares, Polynomial regression, General linear model, Generalized linear model, Discrete choice, Logistic regression, Multinomial logit, Mixed logit, Probit, Multinomial probit, …. • Generally a tool to predict variables 17 Oct 2013 11755/18797 11

Regressions for prediction • y = f(x; Q) + e • Different possibilities – y is a scalar • y is real • y is categorical (classification) – y is a vector – x is a vector • x is a set of real valued variables • x is a set of categorical variables • x is a combination of the two – f(. ) is a linear or affine function – f(. ) is a non-linear function – f(. ) is a time-series model 17 Oct 2013 11755/18797 12

A linear regression Y X • Assumption: relationship between variables is linear – – A linear trend may be found relating x and y y = dependent variable x = explanatory variable Given x, y can be predicted as an affine function of x 17 Oct 2013 11755/18797 13

An imaginary regression. . • • http: //pages. cs. wisc. edu/~kovar/hall. html Check this shit out (Fig. 1). That's bonafide, 100%-real data, my friends. I took it myself over the course of two weeks. And this was not a leisurely two weeks, either; I busted my ass day and night in order to provide you with nothing but the best data possible. Now, let's look a bit more closely at this data, remembering that it is absolutely first-rate. Do you see the exponential dependence? I sure don't. I see a bunch of crap. Christ, this was such a waste of my time. Banking on my hopes that whoever grades this will just look at the pictures, I drew an exponential through my noise. I believe the apparent legitimacy is enhanced by the fact that I used a complicated computer program to make the fit. I understand this is the same process by which the top quark was discovered. 17 Oct 2013 11755/18797 14

Linear Regressions • y = Ax + b + e – e = prediction error • Given a “training” set of {x, y} values: estimate A and b – y 1 = Ax 1 + b + e 1 – y 2 = Ax 2 + b + e 2 – y 3 = Ax 3 + b+ e 3 –… • If A and b are well estimated, prediction error will be small 17 Oct 2013 11755/18797 15

Linear Regression to a scalar y 1 = a. Tx 1 + b + e 1 y 2 = a. Tx 2 + b + e 2 y 3 = a. Tx 3 + b + e 3 n Define: • Rewrite 17 Oct 2013 11755/18797 16

Learning the parameters Assuming no error • Given training data: several x, y • Can define a “divergence”: D(y, ) – Measures how much differs from y – Ideally, if the model is accurate this should be small • Estimate A, b to minimize D(y, ) 17 Oct 2013 11755/18797 17

The prediction error as divergence y 1 = a. Tx 1 + b + e 1 y 2 = a. Tx 2 + b + e 2 y 3 = a. Tx 3 + b + e 3 • Define divergence as sum of the squared error in predicting y 17 Oct 2013 11755/18797 18

Prediction error as divergence • y = a. Tx + e – e = prediction error – Find the “slope” a such that the total squared length of the error lines is minimized 17 Oct 2013 11755/18797 19

Solving a linear regression • Minimize squared error • Differentiating w. r. t A and equating to 0 17 Oct 2013 11755/18797 20

Regression in multiple dimensions y 1 = ATx 1 + b + e 1 y 2 = ATx 2 + b + e 2 y 3 = ATx 3 + b + e 3 yi is a vector yij = jth component of vector yi ai = ith column of A • Also called multiple regression bj = jth component of • Equivalent of saying: yi 1 = a 1 Txi + b 1 + ei 1 yi 2 = a 2 Txi + b 2 + ei 2 yi 3 = a 3 Txi + b 3 + ei 3 yi = ATxi + b + ei • Fundamentally no different from N separate single regressions – But we can use the relationship between ys to our benefit 17 Oct 2013 11755/18797 21 b

Multiple Regression Dx 1 vector of ones • Differentiating and equating to 0 17 Oct 2013 11755/18797 22

A Different Perspective = + • y is a noisy reading of ATx • Error e is Gaussian • Estimate A from 17 Oct 2013 11755/18797 23

The Likelihood of the data • Probability of observing a specific y, given x, for a particular matrix A • Probability of collection: • Assuming IID for convenience (not necessary) 17 Oct 2013 11755/18797 24

A Maximum Likelihood Estimate • Maximizing the log probability is identical to minimizing the trace – Identical to the least squares solution 17 Oct 2013 11755/18797 25

Predicting an output • From a collection of training data, have learned A • Given x for a new instance, but not y, what is y? • Simple solution: 17 Oct 2013 11755/18797 26

Applying it to our problem • Prediction by regression • Forward regression • xt = a 1 xt-1+ a 2 xt-2…akxt-k+et • Backward regression • xt = b 1 xt+1+ b 2 xt+2…bkxt+k +et 17 Oct 2013 11755/18797 27

Applying it to our problem • Forward prediction 17 Oct 2013 11755/18797 28

Applying it to our problem • Backward prediction 17 Oct 2013 11755/18797 29

Finding the burst • At each time – Learn a “forward” predictor at – At each time, predict next sample xtest = Si at, kxt-k – Compute error: ferrt=|xt-xtest |2 – Learn a “backward” predict and compute backward error • berrt – Compute average prediction error over window, threshold 17 Oct 2013 11755/18797 30

Filling the hole • Learn “forward” predictor at left edge of “hole” – For each missing sample – At each time, predict next sample xtest = Si at, kxt-k • Use estimated samples if real samples are not available • Learn “backward” predictor at left edge of “hole” – For each missing sample – At each time, predict next sample xtest = Si bt, kxt+k • Use estimated samples if real samples are not available • Average forward and backward predictions 17 Oct 2013 11755/18797 31

Reconstruction zoom in Reconstruction area Next glitch Distorted signal Recovered signal Interpolation result 17 Oct 2013 Actual data 11755/18797 32

Incrementally learning the regression Requires knowledge of all (x, y) pairs • Can we learn A incrementally instead? – As data comes in? • The Widrow Hoff rule • Note the structure Scalar prediction version error – Can also be done in batch mode! 17 Oct 2013 11755/18797 33

Predicting a value • What are we doing exactly? – For the explanation we are assuming no “b” (X is 0 mean) – Explanation generalizes easily even otherwise n Let and n n 17 Oct 2013 Whitening x N-0. 5 C-0. 5 is the whitening matrix for x 11755/18797 34

Predicting a value • What are we doing exactly? 17 Oct 2013 11755/18797 35

Predicting a value • Given training instances (xi, yi) for i = 1. . N, estimate y for a new test instance of x with unknown y : • y is simply a weighted sum of the yi instances from the training data • The weight of any yi is simply the inner product between its corresponding xi and the new x – With due whitening and scaling. . 17 Oct 2013 11755/18797 36

What are we doing: A different perspective • Assumes XXT is invertible • What if it is not – Dimensionality of X is greater than number of observations? – Underdetermined • In this case XTX will generally be invertible 17 Oct 2013 11755/18797 37

High-dimensional regression • XTX is the “Gram Matrix” 17 Oct 2013 11755/18797 38

High-dimensional regression • Normalize Y by the inverse of the gram matrix • Working our way down. . 17 Oct 2013 11755/18797 39

Linear Regression in High-dimensional Spaces • Given training instances (xi, yi) for i = 1. . N, estimate y for a new test instance of x with unknown y : • y is simply a weighted sum of the normalized yi instances from the training data – The normalization is done via the Gram Matrix • The weight of any yi is simply the inner product between its corresponding xi and the new x 17 Oct 2013 11755/18797 40

Relationships are not always linear • How do we model these? • Multiple solutions 17 Oct 2013 11755/18797 41

Non-linear regression • y = Aj(x)+e n n Y = AF(X)+e Replace X with F(X) in earlier equations for solution 17 Oct 2013 11755/18797 42

Problem n n Y = AF(X)+e Replace X with F(X) in earlier equations for solution F(X) may be in a very high-dimensional space The high-dimensional space (or the transform F(X)) may be unknown. . 17 Oct 2013 11755/18797 43

The regression is in high dimensions • Linear regression: • High-dimensional regression 17 Oct 2013 11755/18797 44

Doing it with Kernels • High-dimensional regression with Kernels: • Regression in Kernel Hilbert Space. . 17 Oct 2013 11755/18797 45

A different way of finding nonlinear relationships: Locally linear regression • Previous discussion: Regression parameters are optimized over the entire training set • Minimize • Single global regression is estimated and applied to all future x • Alternative: Local regression • Learn a regression that is specific to x 17 Oct 2013 11755/18797 46

Being non-committal: Local Regression • Estimate the regression to be applied to any x using training instances near x • The resultant regression has the form – Note : this regression is specific to x • A separate regression must be learned for every x 17 Oct 2013 11755/18797 47

Local Regression • But what is d()? – For linear regression d() is an inner product • More generic form: Choose d() as a function of the distance between x and xj • If d() falls off rapidly with |x and xj| the “neighbhorhood” requirement can be relaxed 17 Oct 2013 11755/18797 48

Kernel Regression: d() = K() • Typical Kernel functions: Gaussian, Laplacian, other density functions – Must fall off rapidly with increasing distance between x and xj • Regression is local to every x : Local regression • Actually a non-parametric MAP estimator of y – But first. . MAP estimators. . 49

Map Estimators • MAP (Maximum A Posteriori): Find a “best guess” for y (statistically), given known x y = argmax Y P(Y|x) • ML (Maximum Likelihood): Find that value of y for which the statistical best guess of x would have been the observed x y = argmax Y P(x|Y) • MAP is simpler to visualize 17 Oct 2013 11755/18797 50

MAP estimation: Gaussian PDF Assume X and Y are jointly Gaussian The parameters of the F 1 Y Gaussian are learned from training data 17 Oct 2013 F 0 X 11755/18797 51

Learning the parameters of the Gaussian 17 Oct 2013 11755/18797 52

Learning the parameters of the Gaussian 17 Oct 2013 11755/18797 53

MAP estimation: Gaussian PDF Assume X and Y are jointly Gaussian The parameters of the F 1 Y Gaussian are learned from training data 17 Oct 2013 F 0 X 11755/18797 54

MAP Estimator for Gaussian RV Assume X and Y are jointly Gaussian Level set of Gaussian The parameters of the Gaussian are learned from training data X 0 Now we are given an X, but no Y What is Y? 11755/18797 55

MAP estimator for Gaussian RV x 0 17 Oct 2013 11755/18797 56

MAP estimation: Gaussian PDF F 1 Y 17 Oct 2013 X 11755/18797 57

MAP estimation: The Gaussian at a particular value of X F 1 17 Oct 2013 x 0 11755/18797 58

MAP estimation: The Gaussian at a particular value of X Most likely value F 1 17 Oct 2013 x 0 11755/18797 59

MAP Estimation of a Gaussian RV Y = argmaxy P(y| X) ? ? ? x 0 17 Oct 2013 11755/18797 60

MAP Estimation of a Gaussian RV x 0 17 Oct 2013 11755/18797 61

MAP Estimation of a Gaussian RV Y = argmaxy P(y| X) x 0 17 Oct 2013 11755/18797 62

So what is this value? • Clearly a line • Equation of Line: • Scalar version given; vector version is identical • Derivation? Later in the program a bit – Note the similarity to regression 17 Oct 2013 11755/18797 63

This is a multiple regression • This is the MAP estimate of y – y = argmax Y P(Y|x) • What about the ML estimate of y – argmax Y P(x|Y) • Note: Neither of these may be the regression line! – MAP estimation of y is the regression on Y for Gaussian RVs – But this is not the MAP estimation of the regression parameter 17 Oct 2013 11755/18797 64

Its also a minimum-mean-squared error estimate • General principle of MMSE estimation: – y is unknown, x is known – Must estimate it such that the expected squared error is minimized – Minimize above term 17 Oct 2013 11755/18797 65

Its also a minimum-mean-squared error estimate • Minimize error: • Differentiating and equating to 0: The MMSE estimate is the mean of the distribution 17 Oct 2013 11755/18797 66

For the Gaussian: MAP = MMSE Most likely value is also The MEAN value § Would be true of any symmetric distribution 17 Oct 2013 11755/18797 67

MMSE estimates for mixture distributions n n n Let P(y|x) be a mixture density The MMSE estimate of y is given by Just a weighted combination of the MMSE estimates from the component distributions 68

MMSE estimates from a Gaussian mixture n Let P(x, y) be a Gaussian Mixture n P(y|x) is also a Gaussian mixture 17 Oct 2013 11755/18797 69

MMSE estimates from a Gaussian mixture n Let P(y|x) is a Gaussian Mixture 17 Oct 2013 11755/18797 70

MMSE estimates from a Gaussian mixture n P(y|x) is a mixture Gaussian density n E[y|x] is also a mixture 17 Oct 2013 11755/18797 71

MMSE estimates from a Gaussian mixture n A mixture of estimates from individual Gaussians 17 Oct 2013 11755/18797 72

Voice Morphing • Align training recordings from both speakers – Cepstral vector sequence • Learn a GMM on joint vectors • Given speech from one speaker, find MMSE estimate of the other • Synthesize from cepstra 17 Oct 2013 11755/18797 73

MMSE with GMM: Voice Transformation - Festvox GMM transformation suite (Toda) awb bdl jmk slt 17 Oct 2013 11755/18797 74

A problem with regressions • ML fit is sensitive – Error is squared – Small variations in data large variations in weights – Outliers affect it adversely • Unstable – If dimension of X >= no. of instances • (XXT) is not invertible 17 Oct 2013 11755/18797 75

MAP estimation of weights a y=a. TX+e X e • Assume weights drawn from a Gaussian – P(a) = N(0, s 2 I) • Max. Likelihood estimate • Maximum a posteriori estimate 17 Oct 2013 11755/18797 76

MAP estimation of weights q q P(a) = N(0, s 2 I) Log P(a) = C – log s – 0. 5 s-2 ||a||2 • Similar to ML estimate with an additional term 17 Oct 2013 11755/18797 77

MAP estimate of weights • Equivalent to diagonal loading of correlation matrix – Improves condition number of correlation matrix • Can be inverted with greater stability – Will not affect the estimation from well-conditioned data – Also called Tikhonov Regularization • Dual form: Ridge regression • MAP estimate of weights – Not to be confused with MAP estimate of Y 17 Oct 2013 11755/18797 78

MAP estimate priors • Left: Gaussian Prior on W • Right: Laplacian Prior 17 Oct 2013 11755/18797 79

MAP estimation of weights with laplacian prior • Assume weights drawn from a Laplacian – P(a) = l-1 exp(-l-1|a|1) • Maximum a posteriori estimate • No closed form solution – Quadratic programming solution required • Non-trivial 17 Oct 2013 11755/18797 80

MAP estimation of weights with laplacian prior • Assume weights drawn from a Laplacian – P(a) = l-1 exp(-l-1|a|1) • Maximum a posteriori estimate –… • Identical to L 1 regularized least-squares estimation 17 Oct 2013 11755/18797 81

L 1 -regularized LSE • No closed form solution – Quadratic programming solutions required • Dual formulation subject to • “LASSO” – Least absolute shrinkage and selection operator 17 Oct 2013 11755/18797 82

LASSO Algorithms • Various convex optimization algorithms • LARS: Least angle regression • Pathwise coordinate descent. . • Matlab code available from web 17 Oct 2013 11755/18797 83

Regularized least squares Image Credit: Tibshirani • Regularization results in selection of suboptimal (in least-squares sense) solution – One of the loci outside center • Tikhonov regularization selects shortest solution • L 1 regularization selects sparsest solution 17 Oct 2013 11755/18797 84

LASSO and Compressive Sensing Y = X a • Given Y and X, estimate sparse W • LASSO: – X = explanatory variable – Y = dependent variable – a = weights of regression • CS: – X = measurement matrix – Y = measurement – a = data 17 Oct 2013 11755/18797 85

An interesting problem: Predicting War! • Economists measure a number of social indicators for countries weekly – Happiness index – Hunger index – Freedom index – Twitter records –… • Question: Will there be a revolution or war next week? 17 Oct 2013 11755/18797 86

An interesting problem: Predicting War! • Issues: – Dissatisfaction builds up – not an instantaneous phenomenon • Usually – War / rebellion build up much faster • Often in hours • Important to predict – Preparedness for security – Economic impact 17 Oct 2013 11755/18797 87

Predicting War O 1 O 2 O 3 O 4 W W S Given S S S O 5 O 6 O 7 O 8 W W S S wk 1 wk 2 wk 3 wk 4 wk 5 wk 6 wk 7 wk 8 – Sequence of economic indicators for each week – Sequence of unrest markers for each week • At the end of each week we know if war happened or not that week • Predict probability of unrest next week – This could be a new unrest or persistence of a current one 17 Oct 2013 11755/18797 88

Predicting Time Series • Need time-series models • HMMs – later in the course 17 Oct 2013 11755/18797 89