Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul

  • Slides: 38
Download presentation
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ. tu. ac. th 11 January

Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ. tu. ac. th 11 January 2010

Introduction (1) • We want to forecast: – The rate of growth of employment,

Introduction (1) • We want to forecast: – The rate of growth of employment, – The change in annual inflation, – The change in federal fund rate. • A standard and simple system approach in economics is the VAR.

Introduction (2) • OLS provides the efficient estimator for the VAR. • However, there

Introduction (2) • OLS provides the efficient estimator for the VAR. • However, there a lot of evidences showing that Bayesian VAR outperforms unrestricted OLS VAR in out-of-sample forecasting: – Litterman (1986), and Robertson and Tallman (1999).

Introduction (3) • Banbura et al. (2008) also show that it is possible and

Introduction (3) • Banbura et al. (2008) also show that it is possible and satisfactory to employ many endogenous variables with long lags in the Bayesian VAR (131 var, 13 lags). • We see some studies following this direction.

Introduction (4) • There is another related literature in forecasting using large number of

Introduction (4) • There is another related literature in forecasting using large number of predictors in the model. • A popular method is the “Approximate Factor Model”, proposed by Stock and Watson (2002).

Introduction (5) • In this literature, it was shown that using larger number of

Introduction (5) • In this literature, it was shown that using larger number of predictors (independent variables) does not always help improve the forecasting performances. • Bai and Ng (2008) show that selecting variables using the LASSO or the elastic net, before applying the methodology of the approximate factor model can outperform bigger models.

Introduction (6) • Even they interpret their results differently, we see that this is

Introduction (6) • Even they interpret their results differently, we see that this is an evidence of redundancy of models with large predictors. • Now, considering VAR with large endogenous variables and long lags, we think that redundancy should be the case as well.

Introduction (7) • We have not gone into VAR with large endogenous variables yet.

Introduction (7) • We have not gone into VAR with large endogenous variables yet. But we are working with 13 lags in the VAR.

Bias-Variance Tradeoff (1) • Suppose OLS estimate is unbiased. • Gauss-Markov Theorem: – OLS

Bias-Variance Tradeoff (1) • Suppose OLS estimate is unbiased. • Gauss-Markov Theorem: – OLS estimate has the smallest variance among all linear unbiased estimates. • However, we know that there are some biased estimates that have smaller variances than the OLS estimate.

Bias-Variance Tradeoff (2) OLS; Unbiased, but High Variances True Model x x Shrinkage; Biased,

Bias-Variance Tradeoff (2) OLS; Unbiased, but High Variances True Model x x Shrinkage; Biased, but Small Variance

VAR (1) • We consider a VAR relationship. • Note here that we cannot

VAR (1) • We consider a VAR relationship. • Note here that we cannot write the biasvariance tradeoff for the VAR. – The OLS estimate is biased under finite sample. • We still think similar logic applies. However, direction of shrinkage may be important.

VAR (2) • With T observations, we have: where We assume

VAR (2) • With T observations, we have: where We assume

VAR (3) • The unrestricted OLS estimator is: • This estimator may not be

VAR (3) • The unrestricted OLS estimator is: • This estimator may not be defined if we have too many endogenous variables or too many lags.

Bayesian VAR (1) • This is a shrinkage regression. • We follow Kadiyala and

Bayesian VAR (1) • This is a shrinkage regression. • We follow Kadiyala and Karlson (1997) and Banbura et al. (2008) to use the Normal-(Inverted)-Wishart as our prior distribution. • We work with stationary and demeaned variables. Hence, we set the mean of prior distribution at zero.

Bayesian VAR (2) • We can write the (point) estimator of our Bayesian VAR

Bayesian VAR (2) • We can write the (point) estimator of our Bayesian VAR estimate as: • where

Ridge Regression (1) • Well-known in statistical literature. • Can be defined as: •

Ridge Regression (1) • Well-known in statistical literature. • Can be defined as: • This is a regression that imposes a penalty on the size of the estimated coefficients.

Ridge Regression (2) • The solution of the previous problem is: • Observe the

Ridge Regression (2) • The solution of the previous problem is: • Observe the similarity with:

BVAR v RR (1) • Proposition 1: – BVAR estimator can be seen as

BVAR v RR (1) • Proposition 1: – BVAR estimator can be seen as the solution of the optimization problem: – where is the (j, j)-th element of the matrix .

BVAR v RR (2) • Proposition 2: – Let , we have: – Where

BVAR v RR (2) • Proposition 2: – Let , we have: – Where • Note: If , is just standardized .

LASSO (1) • Least Absolute Shrinkage and Selection Operator. • The LASSO estimate can

LASSO (1) • Least Absolute Shrinkage and Selection Operator. • The LASSO estimate can be defined as:

LASSO (2) • LASSO is proposed because: – Ridge regression is not parsimonious. –

LASSO (2) • LASSO is proposed because: – Ridge regression is not parsimonious. – Ridge regression may generate huge prediction errors under sparse matrix of true (unknown) coefficients. • LASSO can outperform RR if: – True (unknown) coefficients are composed of a lot of zeros.

LASSO (3) • If there a lot of irrelevant variables in the model, setting

LASSO (3) • If there a lot of irrelevant variables in the model, setting their coefficients at zeros every time can reduce variance without disturbing the bias that much. • We see that VAR with 13 lags may possess a lot of irrelevant variables.

The Elastic Net (1) • Zou and Hastie (2005) propose another estimate that can

The Elastic Net (1) • Zou and Hastie (2005) propose another estimate that can further improve the performance of LASSO. • It is called the elastic net, and the naïve version can be defined as:

The Elastic Net (2) • We modify the elastic to allow treating different lagged

The Elastic Net (2) • We modify the elastic to allow treating different lagged variables differently. • Our modified naïve elastic net is:

Implementation • We can use the algorithm called “LARS” proposed by Efron, Hastie, Johnstone,

Implementation • We can use the algorithm called “LARS” proposed by Efron, Hastie, Johnstone, and Tibshirani (2004) to implement both LASSO and EN efficiently. • This can be applied to our modified version as well.

Empirical Study (1) • I use the US data set from Stock and Watson

Empirical Study (1) • I use the US data set from Stock and Watson (2005). – Monthly data cover Jan 1959 – Dec 2003. – There are 132 variables. But I use only 7. • I transformed the data as in De Mol, Giannone, and Reichlin (2008) to obtain stationary. – Their replication file can be downloaded. – Their transformation make every variable to be annual growth or change in annual growth.

Empirical Study (2) • Out-of-sample performances. – In each month from Jan 1981 to

Empirical Study (2) • Out-of-sample performances. – In each month from Jan 1981 to Dec 2003 (276 times), regress one model using the most recent 120 observations, to make one forecast. – The performances are measured using Relative Mean Squared Forecast Errors (RMSFE), using OLS as the benchmark regression.

Empirical Study (3) • There are 3 variables that we want to forecast: –

Empirical Study (3) • There are 3 variables that we want to forecast: – The employment (EMPL) – The annual inflation (INF) – The Federal Fund Rate (FFR). • The order of VAR is p = 13. • There are 4 forecast horizons (1, 3, 6, 12), and 3 values of (0, 1, 2).

Empirical Study (4) • The most time-consuming part is to figure out suitable parameters

Empirical Study (4) • The most time-consuming part is to figure out suitable parameters for each regression. • We use grid searches on out-of-sample performances during the test period Jan 1971 – Dec 1980 (120 times). – Bayesian VAR: We employ the process in my previous chapter. – LASSO: A grid of 90 values. – Modified Elastic Net: A grid of 420 pairs of values.

Empirical Study (5) • We also employ the combination of LASSO and Bayesian VAR

Empirical Study (5) • We also employ the combination of LASSO and Bayesian VAR as well. – LASSO discards some variables that tend to correspond with zero true coefficients. – Bayesian VAR is similar to ridge regression, which assigns better amount of shrinkage to positive coefficients.

Empirical Study (6) • For the smallest model, we use the 3 variables to

Empirical Study (6) • For the smallest model, we use the 3 variables to forecast themselves.

Empirical Study (7)

Empirical Study (7)

Empirical Study (8)

Empirical Study (8)

Empirical Study (9) Comparing different regressions. Pi = 0

Empirical Study (9) Comparing different regressions. Pi = 0

Empirical Study (10) Comparing different regressions. Pi = 0

Empirical Study (10) Comparing different regressions. Pi = 0

Empirical Study (11) When we change to 7 -variable VAR.

Empirical Study (11) When we change to 7 -variable VAR.

Conclusion • Even the empirical results are not impressive, we still think this is

Conclusion • Even the empirical results are not impressive, we still think this is a promising way to improve the performances of Bayesian VARs. • When the model becomes bigger, e. g. models with 131 endogenous variables, this should be more relevant. • We can think of some cautions like Boivin and Ng’s (2006) for the VAR as well.

Thank you very much.

Thank you very much.