Predicting Endogenous Bank Health from FDIC Statistics on
Predicting Endogenous Bank Health from FDIC Statistics on Depository Institutions Using Deep Learning David Jungreis, Noah Capp, Meysam Golmohammadi, and Joseph Picone Neural Engineering Data Consortium Temple University Philadelphia, Pennsylvania, USA
Abstract • The Federal Deposit Insurance Corporation (FDIC) keeps records of banking data in its Statistics on Depository Institutions (SDI) going back to the fourth quarter of 1992. § The data are reported quarterly. § There approximately 1, 050 variables. • We hypothesized that impending failure could be predicted from these data. • We restricted the data set to 60 quantitative variables that had no missing data. § The analysis was done from 2000, Q 1 through 2017, Q 2. • Deep learning approaches based on multilayer perceptrons and convolutional neural networks were evaluated and failed to accurately predict failures better than guessing based on priors, even in-sample. • The inability to overfit to the training data shows the challenges of finding failure-predicting trends that are strictly intrinsic to a bank Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 1
The Global Financial Crisis (GFC) of 2007 -2009 • Driven by subprime mortgages § Loans to people who had no realistic chance of repaying § Loans packaged and sold as CDOs • Washington Mutual Failed • Lehman Brothers Failed • Bear Stearns Failed • Citi and Bank of America failed but were “assisted” by the US Government • Failures driven by: § Leverage (debt as an asset) § Concentration of risk (particularly in mortgages & mortgage derivatives) § Emphasis on short-term goals over long-term risk • Alan Greenspan: Great Depression was the worse economic crisis, but the GFC was the worse financial crisis Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 2
Graph of US Bank Failures 2000 -2017 Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 3
Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 4
Previous Work on Bank Failure Predictions • Logistic Regression § Zheng 2016 • Support Vector Machines § Erdogan 2008, 2013 § Boyacioglu 2009 • Dense Neural Networks § Tam 1991, 1992 § Boyacioglu 2009 • Common themes: § Not the cutting-edge of vector time series classification Ø Modern neural nets have sophisticated topologies § Limited forecasting ability! Strong ties to particular times or general economic health Ø Ability to forecast is crucial in finance Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 5
The Importance of Forecasting • Who wants to bet with me on the following? § France wins 2018 World Cup § The dinosaurs go extinct § Washington Mutual fails in September, 2008 • No! You want to make predictions about the future: § 2022 World Cup § Bees going extinct § Wells Fargo failing tomorrow • Our goal: predict failures, but only consider data endogenous to a bank Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 6
Our Approach • Restrict the data just to include variables intrinsic to each bank § Assets, Liabilities § No DJIA, FTSE 100, LIBOR § Extrinsic variables introduce shocks that are hard to predict • Consider banks from different eras to mitigate biases based on time § If bank health is endogenous, then the different eras can be compared • Vector time series data § Evocative of speech or EEG data § Success in those fields using deep learning § Especially convolutional neural networks (CNNs, Conv. Nets) § Examine relationships across time and across variables • Particularly with a CNN approach, this ends up being like an image classification problem (e. g. MNIST) § This is guiding intuition for our machine learning experiments Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 7
Data Preparation • Federal Deposit Insurance Corporation (FDIC) keeps statistics on Depository Institutions (SDI) § 1, 050 variables on every FDIC-insured bank § Database goes back to the fourth quarter of 1992 • Data listed by quarter § Inconvenient format for our supervised learning problem § Making predictions based on the status of each bank, not the status of banking in general § Transformed data to correspond to multiple time periods for each bank • Banks images cropped to form equal-length time series § Failed banks ended in their last quarter § Successful banks had their ending dates selected at random § 20 quarters long Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 8
Problems in the Data • Missing values § Problematic to CNNs § Some variables not tracked for any bank in a quarter § Of the 1, 050 variables, only 60 were “full variables” Ø No missing data for any bank in any quarter • Bank mergers and acquisitions (M&A) § JPMorgan Chase acquires Washington Mutual in 2008 and suddenly has an extra $300 billion in assets § Exogenous shock § Disruptive to machine learning • Resolution § Missing values: restrict the data set to the 60 full variables Ø Alternatives include linear interpolation or autoencoders § M&A: part of the endogenous nature of our approach Ø Sudden influx of assets could indicate good health Ø Management is optimistic Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 9
Missing Data • Missing values § Problematic to CNNs § Some variables not tracked for any bank in a quarter § Of the 1, 050 variables, only 60 were “full variables” Ø No missing data for any bank in any quarter Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 10
M&A Activity • Bank mergers and acquisitions (M&A) § JPMorgan Chase acquires Washington Mutual in 2008 and suddenly has an extra $300 billion in assets § Exogenous shock § Disruptive to machine learning Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 11
Data Organization • Inconsistent indexing of columns § In 2000 Q 1, first quantitative variable in position AK Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 12
Data Organization • Inconsistent indexing of columns § In 2000 Q 1, first quantitative variable in position AK § By 2017 Q 2, moved over two spots to position AM § Not a showstopper, but a wrinkle in need of addressing in the data wrangling phase Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 13
Data Organization • CSV File format § Commas within the entries § Entries split by quote-comma-quote, but just commas § Again, not a showstopper, but something to keep in mind for future researchers during the data wrangling phase Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 14
Off to Keras We Go! • CNN in Keras with a Theano backend (we have since moved to Tensor. Flow) • Training set: 400 successful, 400 failed • Testing set: 100 successful, 100 failed • Baseline test using CNN from laboratory EEG work § 3 x 3 filtering, 2 x 2 maxpooling, “Re. LU” activation function Ø Again. . . again. . . Ø Dense layer with 512 neurons • Result failed to outperform random guessing based on the 50/50 prior distribution, even on the training data • Next attempt: take first-order differences of the marginal time series Ø Run same CNN Ø Get same result of not outperforming guessing, even on training data • Second-order marginal time series differences Ø Run same CNN Ø Get same result of not outperforming guessing, even on training data Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 15
Our CNN Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 16
Assessment of CNN Performance • Shocking in-sample performance • Neural nets are parameter-heavy § 400, 000 parameters for 800 observations, p >> n • Strong potential for overfitting § Good in-sample performance § Poor out-of-sample performance • High parameter variance is the usual culprit of neural net failures • Our problem was bias – not even performing well on the training data • One solution to bias reduction is to increase the parameter count • CNNs are parameter-light compared to multilayer perceptrons • Our CNNs also used dropout to further decrease the parameter count and improve generalization Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 17
High-Parameter Models • Increase the parameter count by using fully-connected (dense) layers • No dropout until the bias problem is solved • 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 20 layers of 512 neurons • 24 million parameters in the deepest network • Still could not outperform random guessing, even on the training set • Every bank got classified the same way § Inconsistent: randomly toggled between failure and remain solvent Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 18
Summary of Performance Model • • • Time Series Differences Parameters In-Sample Accuracy Out-of-Sample Accuracy Logit 0 1, 201 56. 00% 51. 00% CNN 0 249, 217 49. 38% 50. 00% CNN 1 249, 217 50. 87% 50. 00% CNN 2 249, 217 49. 38% 50. 00% Dense 0 24, 652, 609 50. 00% Dense 1 23, 669, 569 50. 00% Dense 2 22, 686, 529 50. 00% Tried both convolutional and dense neural nets, and logistic regression Took 1 st- and 2 nd- order differences of the marginal time series Highly parameterized models (p >> n) Binary cross-entropy loss function Still poor accuracy metric, no better than 50/50 prior, even in-sample Distributions could not be separated Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 19
Interpretation of the Inability to Overfit • Groups were not separable • Since all classifiers failed on training data, we conclude that the FDIC SDI is inadequate to solve this bank failure question • Ability to forecast comes into question • Regulators do stress testing of banks • Comprehensive Capital Analysis and Review & Dodd-Frank Act Stress Testing • Legitimacy of these tests? • CCAR and DFAST are responses to the GFC, and we haven’t had any terrible economic times since then § Look at the Matterhorn graph; it seems like the time when banks are prone to fail is when the economy turns sour • Stress testing does in fact test the ability of a bank to withstand a poor economy, but then we have to conjecture about which economic disasters are likely or interesting to examine Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 20
Summary and Future Work Summary: § Endogenous bank health important to forecasting failures § Considerable missing data in the FDIC database § FDIC data transformed into image-like vector time series § Inability to separate training data, despite 20 million model parameters § Difficult to produce failure-producing trends strictly intrinsic to a bank Future Work: § Exogenous factors about the economy in general § Link time-sensitive measurements § But then forecasting ability might be compromised § Techniques to fill-in missing data § What to do about mergers and acquisitions Jungries et al. : Predicting Endogenous Bank Health July 17, 2019 21
- Slides: 22