Big Data for Macroeconomic Nowcasting ESS Big Data
Big Data for Macroeconomic Nowcasting ESS Big Data Workshop Ljubljana, 13 Oct 2016
What is nowcasting • • Forecasting the present Similar to “Flash” estimates produced in official statistics New terminology in econometrics New sources have better timeliness. There are however a number of issues here of practical and technical nature. • Practical – access, cost, business continuity, etc. • Technical – mixed frequency, irregular(missing observations), short history, huge number of covariates.
Timeliness of Eurostat first releases (target dates) • • • HICP (T + 17) GDP (T + 45) Retail trade (T + 30) Unemployment (T + 30) As an additional issue it should be mentioned here that data are often revised after the first release sometimes substantially. • Big data on the other hand are almost never revised.
Objectives of the study • Advantages and limitations of using big data for nowcasting • Proposal of appropriate models • Overall recommendations on how to include big data in the nowcasting process • Empirical investigation based on Google trends: • Inflation - HICP • Retail trade index • Unemployment rate
Linear models – this is the main focus of the study •
Types of big data • Tall T >> N Typically very high frequency data with frequency mismatch between explanatory and target variables. • Fat N >> T Most common case - big data collection started recently ( compared to official statistics) • Huge both N and T very big
Big Data and modelling: survey of approaches • • • Machine learning Heuristic optimization Dimensionality reduction Forecast combination and model averaging Mixed frequency methods
Machine learning • Methods are often borrowed from machine learning, however • Prediction is the usually the key interest in ML • Serial correlation is often not present in machine learning instead usually data is i. i. d • The situation of variances changing over time is often not considered (stochastic volatility) • So techniques have to be properly modified to use with macroeconomic data
Penalized regression • Ridge regression • LASSO • Elastic net
Ridge regression • penalty is an additive term proportional to the sum of squares of coefficients, • does not force coefficients to 0 which means it cannot be used for variable selection and is not easy to interpret; • similar to Bayesian regression; • cross validation used for parameter tuning;
LASSO • Minimize residual sum of squares while bounding the sum of absolute values of coefficients by a constant. • Produces 0 coefficients. Adds interpretability. • Variations exist like the Adaptive LASSO. • From several strongly correlated covariates typically one only is selected – potential issue for interpretability if a non “true” but correlated variable is selected instead of a “true” one.
Elastic Nets • Both sum of abs. values of coefficients and sum of squares of coefficients are restricted
Heuristic Optimization •
Experimental results – HICP, Retail Trade, Unemployment – DE, UK, IT – illustrative purposes • Assess relative performance of Google trends and standard indicators • Recursive pseudo out of sample evaluation • Methods tested - : • Autoregressive models (AR) • Dynamic Factor Analysis – state space representation with Kalman filtering (DFA) • Partial Least Squares (PLS) • Bayesian Regression (BR) • LASSO • Model averaging • In total 279 models and combinations
Macroeconomic Predictors • • DE - 22 weekly and 83 monthly variables IT – 8 weekly and 83 monthly variables UK – 31 weekly and 64 monthly variables Types: • • • Equity market price indices Balance sheets and flows from MFIs (monetary financial institutions) Interest rates Labour market variables Foreign trade surveys …
Google trends • Weekly summary index of search volumes (Unemployment): • DE: "Arbeitsamt", "Arbeitsagentur", "Arbeitslosenquote", "Personalberatung", "Stepstone", "Jobscout", "Meinestadt", "meine Stadt" • IT: "impiego", "offerte lavoro", "curriculum", "infojobs" • UK: "jobs", "reed", "part time jobs", "unemployment" • Choice of keywords not ideal , e. g. for Germany "Harz IV" would be nice to include
Results • DFA, BR, PLS and LASSO outperform AR(1) with or without Google Trends • In general only slight improvement with Google Trends • 65% of times chosen best models for averaging include Google Trends for all 3 countries for HICP and Retail Trade • Only final data was used for the evaluation
Recommendations for a structured approach 1. 2. 3. 4. 5. 6. Search for appropriate sources Evaluate cost and continuity Assess aspects related to necessary data preparation Consider strength and stability of relationship with the target Implement one or more modelling methods Conduct in and out of sample evaluation
- Slides: 18