Targets Pregnancy Prediction Problem The Complete Analytical Process
Target’s Pregnancy Prediction Problem The Complete Analytical Process “Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August. ” AAFM Module 1 (c) Kaiser Fung 1
The Model AAFM Module 1 (c) Kaiser Fung 2
Model Inputs Based on Duhigg’s description of Jenny Ward, what kinds of data did Target analysts use in constructing the pregnancy prediction model? Definitely Likely Unlikely or No Age Gender Income Past Purchase Indicators Past Purchase Intensity AAFM Module 1 (c) Kaiser Fung 3
DATA OUTPUT ACTION Past Purchases Pregnancy Scores Product Selection in Brochures - Related items - Target items Age Gender AAFM Module 1 (c) Kaiser Fung 4
Discuss An analyst wants to improve the model by adding more variables to it. Suggest some additional variables. AAFM Module 1 (c) Kaiser Fung 5
Discuss Suggest other actions that can be informed by the predicted pregnancy scores. AAFM Module 1 (c) Kaiser Fung 6
Model DATA OUTPUT ACTION Past Purchases Pregnancy Scores Product Selection in Brochures - Related items - Target items Age Gender AAFM Module 1 (c) Kaiser Fung 7
What is a Model? Valuation Model Source: Keith Howe (2009) AAFM Module 1 (c) Kaiser Fung 8
What is a Model? Climate Model Source: Mark Chandler, Ed. GCM AAFM Module 1 (c) Kaiser Fung 9
What is a Model? Climate Model Source: IPCC AAFM Module 1 (c) Kaiser Fung 10
What is a Model? Digital Marketing Attribution Model AAFM Module 1 (c) Kaiser Fung 11
Digital Marketing Attribution For each response, allocate credit to the responsible channel Display Ad SEO SEM Email AAFM Module 1 (c) Kaiser Fung 12
FIRST CLICK EXP DECAY LAST CLICK Response Display Ad SEO SEM Email t 10 t 9 t 8 t 7 t 6 t 5 t 4 t 3 t 2 t 1 t 0 User clicked on Google organic User clicked on banner ad Influence scalessearch result with time order AAFM Module 1 (c) Kaiser Fung 13
A model is an abstraction, a simplified view of reality AAFM Module 1 (c) Kaiser Fung 14
All models are wrong but some are useful George Box AAFM Module 1 (c) Kaiser Fung 15
First to Your Door “Right around the birth of a child. . . parents are exhausted and overwhelmed and their shopping patterns and brand loyalties are up for grabs. ” “We knew that if we could identify them in their second trimester, there’s a good chance we could capture them for years. ” AAFM Module 1 (c) Kaiser Fung 16
Brochure Design “As long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. ” “We’d put an ad for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance. ” “We started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. ” AAFM Module 1 (c) Kaiser Fung 17
1. Customer acquisition tool 2. It’s predictive 3. So accurate it’s creepy AAFM Module 1 (c) Kaiser Fung 18
The Marketing Problem Better Prospecting More Relevant Brochures More New Customers More Revenue Per Current Customer Lower Cost More Revenue More Profit AAFM Module 1 (c) Kaiser Fung 19
Discuss Identify any problems with the way Target analysts framed the business problem. AAFM Module 1 (c) Kaiser Fung 20
Hint: How is this model scored? AAFM Module 1 (c) Kaiser Fung 21
The Business Problem Revisited Targeting of At-Risk Customers More Relevant Brochures Retain More Customers More Revenue Per Customer Cost: New Brochures v. Fewer Brochures + Any Offers More Revenue More Profit AAFM Module 1 (c) Kaiser Fung 22
Measuring Success According to Duhigg, Target’s pregnancy prediction effort was highly successful. How accurate was Target’s prediction model? The accuracy rate was not disclosed. AAFM Module 1 (c) Kaiser Fung 23
10% of targets are pregnant AAFM Module 1 (c) Kaiser Fung 24
3 of 10 predictions 2 of 5 pregnancies are accurate missed 20% predicted to be pregnant AAFM Module 1 (c) Kaiser Fung 25
Predictive Lift AAFM Module 1 (c) Kaiser Fung 26
Accurate Missed Prediction Opp 20% predicted to be pregnant False Positives Why did Target mix in random products? A) 7 out of 10 receiving brochures will not be pregnant B) 3 out of 10 receiving brochures will feel creeped out A >> B AAFM Module 1 (c) Kaiser Fung 27
1. Customer acquisition tool 2. It’s predictive 3. So accurate it’s creepy 1. Not for customer acquisition 2. It’s not very predictive - even with Big Data 3. Inaccurate and detracting AAFM Module 1 (c) Kaiser Fung 28
Target’s Pregnancy Prediction Problem • Defining and framing the business problem • Collecting data for the analytical model • Selecting an analytical method • Developing a useful model that solves the problem • Describing how model outputs can drive action • Projecting the impact of such action • Measuring the model performance AAFM Module 1 (c) Kaiser Fung 29
Complete Analytical Process AAFM Module 1 (c) Kaiser Fung 30
The Baby Names Voyager Importance of Proper Framing AAFM Module 1 (c) Kaiser Fung 31
Use Cases Which of the following questions can be answered directly by the Baby Names Voyager (without referring to other materials)? A. Why did the name Barbara peak in the 1940 s? B. Is the name Charlotte or Chelsea more popular? C. How popular with David be in the year 2025? D. What name should I choose for my baby girl? AAFM Module 1 (c) Kaiser Fung 32
Other Analyses of the Data Source: Social Security Administration AAFM Module 1 (c) Kaiser Fung 33
Other Analyses of the Data Source: Social Security Administration AAFM Module 1 (c) Kaiser Fung 34
Other Analyses of the Data Source: Social Security Administration AAFM Module 1 (c) Kaiser Fung 35
Other Analyses of the Data AAFM Module 1 (c) Kaiser Fung 36
Other Analyses of the Data AAFM Module 1 (c) Kaiser Fung 37
Other Analyses of the Data AAFM Module 1 (c) Kaiser Fung 38
Other Analyses of the Data AAFM Module 1 (c) Kaiser Fung 39
Inverting the Frame Given a name, which time period is most likely? Given a time period, which names are popular? Baby Names Voyager fivethirtyeight. com Given a name, guess someone’s age Given a name, guess what languages he/she speaks AAFM Module 1 (c) Kaiser Fung 40
DATA OUTPUT ACTION Address Religion First Name Last Name Probabilities of speaking English, Spanish, German, Japanese, etc. Segmentation, Targeting, etc. AAFM Module 1 (c) Kaiser Fung 41
Evaluating Model Performance Make prediction using the median Use IQR as a measure of error Accuracy varies with name Source: fivethirtyeight. com AAFM Module 1 (c) Kaiser Fung 42
Evaluating Model Performance Accuracy varies with gender Accuracy improves with more co-variates Source: fivethirtyeight. com AAFM Module 1 (c) Kaiser Fung 43
Discuss What other co-variates might be useful to help predict age more accurately? AAFM Module 1 (c) Kaiser Fung 44
Complete Analytical Process AAFM Module 1 (c) Kaiser Fung 45
Course Project I. Project Proposal (Wk 3) II. Midterm: Data Cleaning & Processing (Wk 7) III. Final: Analysis & Modeling (Wk 12) AAFM Module 1 (c) Kaiser Fung 46
Project Proposal • Objectives: ‣ Select a dataset and specify a business/organizational problem you ‣ • • • want to solve Diagnose data issues in your dataset (you will fix these issues in Deliverable #2). We cover diagnosing and fixing data issues in Module 2 Due Date: [Sep 25 th], 11: 59 PM Grading: max 10 points All assignment files must be uploaded to Canvas. We do not accept emailed files. Reminder: Late assignments (excused or not) will incur a penalty of 20%. Late without prior notification, or late by more than 7 days, will be scored zero. Ling or I will provide feedback and approval on Canvas. (Please open your documents before you email us asking where our comments are. ) AAFM Module 1 (c) Kaiser Fung 47
Choosing your Dataset • Not too small (e. g. > 500 rows) • Not too big (e. g. < 1 million) • Not too aggregated • Not too dirty • Not too clean • Non-anticipatory (if Prediction) AAFM Module 1 (c) Kaiser Fung 48
Example of a Bad Dataset Ebola in West Africa data Too aggregated For any given business problem, many of these rows will be useless Too few variables AAFM Module 1 (c) Kaiser Fung 49
Selecting an Analytical Problem PREDICTION • • • SEGMENTATION Probability of a borrower defaulting a loan Probability of an email being spam Probability of a customer deactivating (“churn”) Amount of revenues Frequency of visits • There is a response (outcome) variable If the response is binary (yes/no) or categorical (e. g. which product type), also called a “classification” problem Looking for correlations between the response and co-variates Predictions can be validated • • • AAFM Module 1 • • • How many types of customers do we have? What are the characteristics of different types of shoppers? What is the probability that a company has a business model of type A (B, C, etc. )? (advanced) No response (outcome) variable Adding structure to the data Looking for correlations between co-variates Difficult to validate, need external evidence such as survey results (c) Kaiser Fung 50
Non-Anticipatory • You have a dataset of sales records of an electronics manufacturer (B 2 B) • You aggregate the data so that the unit of analysis is the retailer (i. e. your customers are retailers) • You propose to predict sales volume using frequency of different types of products (e. g. hard drives, smartphones, cables) Problem! In order to make predictions using your model, you will need to know the frequency distribution of products sold (inputs). But you don’t know these inputs until you have the sales transactions. So you don’t really have a prediction problem. AAFM Module 1 (c) Kaiser Fung 51
Appendix: Target model accuracy Predicted Y N Y 6 4 10 N 14 76 90 20 80 100 Actual Positive Predictive Value = 6/20 = proportion of pregnancy predictions that are accurate Missed Opportunities = 4/10 = proportion of pregnancies that are not predicted AAFM Module 1 (c) Kaiser Fung 52
- Slides: 52