CONFIDENTIAL www jpmorganchaseinstitute com CONFIDENTIAL The JPMorgan Chase

  • Slides: 10
Download presentation
CONFIDENTIAL www. jpmorganchaseinstitute. com

CONFIDENTIAL www. jpmorganchaseinstitute. com

CONFIDENTIAL The JPMorgan Chase Institute is a global think tank dedicated to delivering data-rich

CONFIDENTIAL The JPMorgan Chase Institute is a global think tank dedicated to delivering data-rich analyses and expert insights for the public good INSTITUTE RESEARCH THEMES INSTITUTE DATA THE JPMORGAN CHASE INSTITUTE LEVERAGES DE-IDENTIFIED DATA FROM: HOUSEHOLD INCOME & SPENDING Research focused on the income and expense dynamics of US consumers. HOUSEHOLD DEBT Research focused on the various forms of household debt, including credit cards and mortgages. 44 LABOR MARKETS Research focused on income from labor including labor market trends, growth of the Online Platform Economy, and the financial impacts of job loss. HEALTHCARE Research focused on out-of-pocket healthcare spending among US households to analyze the relationship between cash flows and healthcare expenditures. CITIES & LOCAL COMMUNITIES Research focused on local commerce, resident spending, and the economic vibrancy of cities and local communities. THOUSAND INSTITUTION AL INVESTORS INDIVIDUAL TRANSACTIONS (October 2012 – one month prior to present) Information on amount, day and time, zip code, channel, and counterparty characteristics ACCOUNT LEVEL INFORMATION Accounts held, activity frequency, and monthly balances (including deposit accounts, savings accounts, money market, credit card, mortgage and home equity loans, and auto loans) DEMOGRAPHIC CHARACTERISTICS SMALL BUSINESS Research focused on the financial volatility of small business cash flow management, inflow and net flow, and overall health of US small businesses. FINANCIAL MARKETS Research focused on institutional investor behavior. 1 On an entirely deidentified sample: gender, banded age, and geography INSTITUTIONAL INVESTORS All types of institutional investors across all asset classes and regions globally

CONFIDENTIAL JPMC Institute Data Assets CONSUMERS/HOUSEHOLDS MORTGAGE • Observe checking accounts and credit card

CONFIDENTIAL JPMC Institute Data Assets CONSUMERS/HOUSEHOLDS MORTGAGE • Observe checking accounts and credit card accounts of ~70 million households providing a window into: o Cash flow dynamics: income and spending categories and account transfers at the daily level o Debt payments and balance sheet outcomes: liquid assets and payments and balances for credit cards, auto loans, student loans, and mortgages Key Sub-samples and Data Assets • Online Platform Economy: 2. 3 million families earning income from 128 different online platforms • Healthcare Out-of-pocket Spending Panel: healthcare spending of sample of 4. 7 million account holders ages 20 -64 • Tax refund recipients and payers: A random sample of one million families and their tax time events • Unemployment Insurance: ~180, 000 UI recipients • Observe over 31 million mortgages serviced by JP Morgan Chase in all 50 states and Washington, DC; many mortgage customers have Chase deposit accounts and/or Chase credit cards Modifications • Observe one million Chase Mortgage customers who received a modification Defaults • Observe over 11, 500 customers with a Chase mortgage and Chase deposit accounts who defaulted on their mortgages Adjustable-rate Mortgages • Observe over 4, 000 Chase Mortgage customers with a 5/1 ARM and a Chase credit card SMALL BUSINESS • Observe 2. 5 million small businesses that hold a Chase Business Banking account • Longitudinal sample of 138, 000 active small businesses founded in 2013 Small Business Owner Health Insurance Premiums • Observe individual health insurance premium payments from owners of 30, 000 small businesses LOCAL COMMERCE • Observe over 96 billion credit and debit card transactions by Chase customers in all 50 states and Washington, DC Local Commerce Spending Time Series • Merchant view with over 22 billion transactions by over 64 million customers at merchants located in the 14 US metro areas we track • Consumer view with over 4 billion transactions, including online spending, by over 7. 7 million customers that reside in the 14 US metro areas we track FINANCIAL MARKETS 2 • Observe 395 million transactions executed by 44, 000 institutional investors • Global dataset includes all types of institutional investors across all regions and covers all asset classes

CONFIDENTIAL Income Estimation: Background and Motivation • The JPMorgan Chase Institute aims to publish

CONFIDENTIAL Income Estimation: Background and Motivation • The JPMorgan Chase Institute aims to publish generalizable insights that are representative of the overall US population • We require a method to reweight or segment research based on key characteristics, with income foremost among them • Without full coverage of income information across our portfolio, we sought a method for estimating income • We could approximate family income with publicly-available sources like IRS data, or JPMCI data on deposit account inflows, but neither does a good job of estimating income on the customers for whom we have that info Benchmark Income Measures MAE=162% MAE=103% • To fill this gap, we developed the JPMorgan Chase Institute Income Estimate (JPMC IIE) version 1. 0, using gradient boosting machines (GBM) to predict gross family income 3

CONFIDENTIAL We used income information from credit card and mortgage data to create our

CONFIDENTIAL We used income information from credit card and mortgage data to create our truth set, though both sources have associated challenges Two sources of ground truth income: Final income truth set used for model training Density plot comparing the truth set before and after stratifying based on ACS income quintiles • Verified income from mortgage applications is accurate, but overrepresents high-income families • Stated income from credit card customers covers a broader range of incomes, but raises accuracy concerns Final Truth Set Construction: • To improve accuracy, remove customers with deposit account inflows totaling more than reported income • To avoid undue influence of extreme observations, remove customers with income in the top or bottom percentile of truth set income ACS quintile accuracy by stratification options • To improve sample representativeness, stratify modeling data by income quintiles* 4 * Quintile ranges determined via data from the American Community Survey (ACS)

CONFIDENTIAL We drew model inputs from Institute data and publicly-available sources, cleaning and treating

CONFIDENTIAL We drew model inputs from Institute data and publicly-available sources, cleaning and treating raw features prior to model training Feature Categories Feature Examples Internal Customer Information Age, inferred gender, ZIP code Checking Acct Attributes* Inflow amt, inflow channel, balances Credit Card Attributes Credit limit, number of Chase cards External Attributes of Other Acct Number of loans, loan terms IRS SOI, Census ZIP-level income and demographics Zillow ZIP-level rental information Feature Treatment: • Handle missing values: Add missing indicator and (for numeric features) impute missings to mean • Transform numeric features: Log-transform, standardize • Transform categorical feature: Convert to series of binary features • Handle ZIP code: Convert to longitude and latitude of centroid 5 * Information on outflows and spending excluded from feature set, as future research may use predicted income to segment spending behaviors

CONFIDENTIAL Iterative modeling approach compared performance across several modeling techniques, then optimized on selected

CONFIDENTIAL Iterative modeling approach compared performance across several modeling techniques, then optimized on selected Gradient Boosting Machines (GBM) • We initially considered several modeling techniques, but quickly selected GBM as best suited to our “MVP” project goals • No need to manually specify functional form of input relationships • Must guard carefully against overspecification (addressed via hyperparameter tuning and use of validation data) • Model interpretation can be challenging, but no more so than parametric models trained on complex observational data • The final model predicts income with mean absolute error (MAE) of 41 percent and consistent accuracy across predicted income quintiles (average 55 percent) Estimated Income vs. Truth Set Income and Corresponding Residuals ACS Quintile Accuracy by Predicted Quintiles 2017 version 1. 0 of JPMC IIE 6

CONFIDENTIAL JPMCI IIE version 1. 0 is currently in-use for Institute research, and performs

CONFIDENTIAL JPMCI IIE version 1. 0 is currently in-use for Institute research, and performs similar to truth set income when used for sample reweighting • We tested JPMC IIE version 1. 0 performance in reweighting the sample population for the Institute’s Healthcare Out-of-pocket Spending Panel (HOSP) • For the families that have truth set income, the average out-of-pocket health spending levels were similar when we weighted the sample using JPMC IIE and age, compared to when we weighted using age and truth set income — the “gold standard” for comparison Out-of-Pocket Health Spending Across Years, by Different Weighting Schemes 7

CONFIDENTIAL Upcoming efforts will focus on enhancements in three broad areas Data Expansion •

CONFIDENTIAL Upcoming efforts will focus on enhancements in three broad areas Data Expansion • Expand modeling universe to include credit-only customers • Incorporate additional input features from administrative banking data Feature Refinement • Thorough data treatment can improve performance and interpretability, but IIE v 1. 0 included minimal feature engineering • Avenues for further exploration include creation of additional geographic features, trends over time, behavioral segments, etc. Insight Exploration • Feature relationship exploration: Deepen understanding of the relationships between input features and predicted income • Demographic monitoring: Assess whether modeled income reflects demographic biases • Residual monitoring: Assess the model for systematic weaknesses, segments of poor model performance 8

CONFIDENTIAL www. jpmorganchaseinstitute. com #JPMCInstitute

CONFIDENTIAL www. jpmorganchaseinstitute. com #JPMCInstitute