Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul






































- Slides: 38
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul pawin@econ. tu. ac. th 11 January 2010
Introduction (1) • We want to forecast: – The rate of growth of employment, – The change in annual inflation, – The change in federal fund rate. • A standard and simple system approach in economics is the VAR.
Introduction (2) • OLS provides the efficient estimator for the VAR. • However, there a lot of evidences showing that Bayesian VAR outperforms unrestricted OLS VAR in out-of-sample forecasting: – Litterman (1986), and Robertson and Tallman (1999).
Introduction (3) • Banbura et al. (2008) also show that it is possible and satisfactory to employ many endogenous variables with long lags in the Bayesian VAR (131 var, 13 lags). • We see some studies following this direction.
Introduction (4) • There is another related literature in forecasting using large number of predictors in the model. • A popular method is the “Approximate Factor Model”, proposed by Stock and Watson (2002).
Introduction (5) • In this literature, it was shown that using larger number of predictors (independent variables) does not always help improve the forecasting performances. • Bai and Ng (2008) show that selecting variables using the LASSO or the elastic net, before applying the methodology of the approximate factor model can outperform bigger models.
Introduction (6) • Even they interpret their results differently, we see that this is an evidence of redundancy of models with large predictors. • Now, considering VAR with large endogenous variables and long lags, we think that redundancy should be the case as well.
Introduction (7) • We have not gone into VAR with large endogenous variables yet. But we are working with 13 lags in the VAR.
Bias-Variance Tradeoff (1) • Suppose OLS estimate is unbiased. • Gauss-Markov Theorem: – OLS estimate has the smallest variance among all linear unbiased estimates. • However, we know that there are some biased estimates that have smaller variances than the OLS estimate.
Bias-Variance Tradeoff (2) OLS; Unbiased, but High Variances True Model x x Shrinkage; Biased, but Small Variance
VAR (1) • We consider a VAR relationship. • Note here that we cannot write the biasvariance tradeoff for the VAR. – The OLS estimate is biased under finite sample. • We still think similar logic applies. However, direction of shrinkage may be important.
VAR (2) • With T observations, we have: where We assume
VAR (3) • The unrestricted OLS estimator is: • This estimator may not be defined if we have too many endogenous variables or too many lags.
Bayesian VAR (1) • This is a shrinkage regression. • We follow Kadiyala and Karlson (1997) and Banbura et al. (2008) to use the Normal-(Inverted)-Wishart as our prior distribution. • We work with stationary and demeaned variables. Hence, we set the mean of prior distribution at zero.
Bayesian VAR (2) • We can write the (point) estimator of our Bayesian VAR estimate as: • where
Ridge Regression (1) • Well-known in statistical literature. • Can be defined as: • This is a regression that imposes a penalty on the size of the estimated coefficients.
Ridge Regression (2) • The solution of the previous problem is: • Observe the similarity with:
BVAR v RR (1) • Proposition 1: – BVAR estimator can be seen as the solution of the optimization problem: – where is the (j, j)-th element of the matrix .
BVAR v RR (2) • Proposition 2: – Let , we have: – Where • Note: If , is just standardized .
LASSO (1) • Least Absolute Shrinkage and Selection Operator. • The LASSO estimate can be defined as:
LASSO (2) • LASSO is proposed because: – Ridge regression is not parsimonious. – Ridge regression may generate huge prediction errors under sparse matrix of true (unknown) coefficients. • LASSO can outperform RR if: – True (unknown) coefficients are composed of a lot of zeros.
LASSO (3) • If there a lot of irrelevant variables in the model, setting their coefficients at zeros every time can reduce variance without disturbing the bias that much. • We see that VAR with 13 lags may possess a lot of irrelevant variables.
The Elastic Net (1) • Zou and Hastie (2005) propose another estimate that can further improve the performance of LASSO. • It is called the elastic net, and the naïve version can be defined as:
The Elastic Net (2) • We modify the elastic to allow treating different lagged variables differently. • Our modified naïve elastic net is:
Implementation • We can use the algorithm called “LARS” proposed by Efron, Hastie, Johnstone, and Tibshirani (2004) to implement both LASSO and EN efficiently. • This can be applied to our modified version as well.
Empirical Study (1) • I use the US data set from Stock and Watson (2005). – Monthly data cover Jan 1959 – Dec 2003. – There are 132 variables. But I use only 7. • I transformed the data as in De Mol, Giannone, and Reichlin (2008) to obtain stationary. – Their replication file can be downloaded. – Their transformation make every variable to be annual growth or change in annual growth.
Empirical Study (2) • Out-of-sample performances. – In each month from Jan 1981 to Dec 2003 (276 times), regress one model using the most recent 120 observations, to make one forecast. – The performances are measured using Relative Mean Squared Forecast Errors (RMSFE), using OLS as the benchmark regression.
Empirical Study (3) • There are 3 variables that we want to forecast: – The employment (EMPL) – The annual inflation (INF) – The Federal Fund Rate (FFR). • The order of VAR is p = 13. • There are 4 forecast horizons (1, 3, 6, 12), and 3 values of (0, 1, 2).
Empirical Study (4) • The most time-consuming part is to figure out suitable parameters for each regression. • We use grid searches on out-of-sample performances during the test period Jan 1971 – Dec 1980 (120 times). – Bayesian VAR: We employ the process in my previous chapter. – LASSO: A grid of 90 values. – Modified Elastic Net: A grid of 420 pairs of values.
Empirical Study (5) • We also employ the combination of LASSO and Bayesian VAR as well. – LASSO discards some variables that tend to correspond with zero true coefficients. – Bayesian VAR is similar to ridge regression, which assigns better amount of shrinkage to positive coefficients.
Empirical Study (6) • For the smallest model, we use the 3 variables to forecast themselves.
Empirical Study (7)
Empirical Study (8)
Empirical Study (9) Comparing different regressions. Pi = 0
Empirical Study (10) Comparing different regressions. Pi = 0
Empirical Study (11) When we change to 7 -variable VAR.
Conclusion • Even the empirical results are not impressive, we still think this is a promising way to improve the performances of Bayesian VARs. • When the model becomes bigger, e. g. models with 131 endogenous variables, this should be more relevant. • We can think of some cautions like Boivin and Ng’s (2006) for the VAR as well.
Thank you very much.