Predictive Analytics World Predictive Keyword Scores to Optimize

Problem • Advertisers bidding on keywords on search engines (PPC programs offered by Google,

Short Term Goals • Short term ROA can be negative • Paid + organic

Issues • • Keywords with few clicks (“long tail”): difficult to predict Attaching a

Keyword Scoring • • Same as click scoring / credit card transaction scoring Scores

Bidding Strategy • Goals – – Be able to predict response for keywords with

Examples of Rules • Text mining rules used in the keyword scoring engine –

Issues • • Keyword cleaning Each keyword contains multiple 1 -term and 2 -term

Results: Interpretation • Keyword score is a good predictor of conversion rate • Bids

Results: Cross-Validation • Process 15 days worth of data using score lookup tables based

Slides: 11

Download presentation

Predictive Analytics World Predictive Keyword Scores to Optimize PPC Campaigns Vincent Granville, Ph. D. Click Forensics February 19, 2009 CONFIDENTIAL 1

Problem • Advertisers bidding on keywords on search engines (PPC programs offered by Google, Yahoo, etc. ) • Bidding strategy should achieve some goals, typically – – Profit optimization ROA (revenue on ad spend) optimization Minimization of cost of user acquisition Maximization of user lifetime value

Short Term Goals • Short term ROA can be negative • Paid + organic search usually provides positive ROI • Organic search used as a leverage to buy traffic and increase reach • KPI’s: – – Clicks per keyword Conversions per keyword Revenue, profit or return Conversion rate

Issues • • Keywords with few clicks (“long tail”): difficult to predict Attaching a conversion to a click: data quality (cookies) Revenue numbers not known until tomorrow New bid => Google needs to “learn” how to handle it – Real time implementation of keyword bidding subject to high volatility – Focus on end-of-day or bi-weekly algorithm – Pitfall 1: if max bid is much higher than actual CPC => Google will eventually notice! – Pitfall 2: keyword performance can be impacted by “poor” keywords in same ad group, or by impression fraud / click fraud spikes – Match type

Keyword Scoring • • Same as click scoring / credit card transaction scoring Scores computed at the keyword / ad group level Response: RPC, Conversion rate, etc. Independent variables: binary rules – Actually, there are highly auto-corraleted • Model – – – Logistic regression (ridge or constrained regression) Naïve Bayes (related to logistic regression via the odds ratio) Decision trees or combo Score is predictor of RPC, return or conversion rate, etc. Conversion blending

Bidding Strategy • Goals – – Be able to predict response for keywords with very little historical data Be able to predict response for new keywords Conversion rate = f(score) New bid = g(previous bid, keyword score, ROI, RPC, …) • Methodology – – Permanent multivariate (A/B/C) testing g is a parametric function A/B/C: each case corresponds to a particular parameter set Moves in parameter space driven by a simulated annealing algorithm

Examples of Rules • Text mining rules used in the keyword scoring engine – – – Length of keyword Number of terms Keyword contains “free”, “new”, “ 2009” Keyword contains digits Keyword contain top 1 -term word with known response Keyword contains 2 -term word with known response • Example – Keyword “used car Honda 2000” contains the 2 -term word “used car” – All keywords containing “used car” have on average a 5% conversion rate

Issues • • Keyword cleaning Each keyword contains multiple 1 -term and 2 -term words Scalability Most keywords contain at least one top term – With 50 MM keywords and 25 K top terms, 95% of the keywords contain one top term (at least) – Response is not known for most of the 50 MM keywords (too granular), but it is known for each of the 25 K top terms (aggregate level) – Works with new keywords

Results Test Data

Results: Interpretation • Keyword score is a good predictor of conversion rate • Bids are too high on good keywords, too low on poor keywords • Simple corrective action suggested – A/B/C parametric bidding strategy not discussed here • Cross validation: see next slide

Results: Cross-Validation • Process 15 days worth of data using score lookup tables based on training set • No time period overlap, between training and test • Keyword overlap • Large volume of new keywords (“new” means a KW not found in training set) • Robustness against missing data / new keywords • Predictive power somewhat reduced, but still good