Behind Google Ads Greg J Badros greggoogle com
Behind Google Ads Greg J. Badros greg@google. com
Outline • History of Google and Google Ads • Basic advertising • Auction theory • Managing data with My. SQL • Contextually targeted ads – Text-processing & Information Retrieval for fun & profit! • Fun at Google • Questions? 2
Outline • History of Google and Google Ads • Basic advertising • Auction theory • Managing data with My. SQL • Contextually targeted ads – Text-processing & Information Retrieval for fun & profit! • Fun at Google • Questions? 3
In the beginning, there was search… 4 google. stanford. edu (c. 1997)
We Have Just Begun Delivering On Our Mission Organizing the world’s information and making it universally accessible and useful 5
Premium Sponsorships • Advertising on keywords in the (single) “top spot” • Pay for eyeballs • Allow reserved inventory • Sold directly by Google sales department 6
Ad. Words • Multiple ads on keywords, down right-hand-side • Charges based on position • Online sales $15 $12 $10 $8 7
Ad. Words Select • Both right-hand-side and “promoted” top ads • Still online sales • Pay for user clicks, not impressions • Introduced the auction mechanism, no more reserved inventory 8
Lesson Learned You do not need to build what the market is asking for to succeed. “If I’d asked my customers what they wanted, I would’ve had to go looking for faster horses. ” — Henry Ford 9
Outline • History of Google and Google Ads • Basic advertising • Auction theory • Managing data with My. SQL • Contextually targeted ads – Text-processing & Information Retrieval for fun & profit! • Fun at Google • Questions? 10
Definitions Soup • Page inventory available page slots for ads • Keywords terms entered in a search, bought by adv. • Impression showing ad to a user • Creative – the text/image/video that is shown • CPM Cost Per Mille (1, 000 impressions) • CPC Cost Per Click • CTR Click Through Rate (= Clicks/Impressions) 11
Creating an Ad. Words Ad 12
Specialized Search • Given a query, find the best ads from over 100, 000 advertisers • How do you model utility to users? – Want high-quality, targeted ads, that generate revenue – Balance importance of high click-through-rate (CTR) with advertiser’s willingness to pay • Auction theory helps! 13
Ranking Ads Keyword: skydive Skydive with Us Only one accident last year. Have fun and play the odds! www. skydivewithus. com Need Skydiving Insurance? We’ve got your back. Even if you lose, you win! www. skydiveinsurance. com CPC=$0. 40, CTR=2% CPC=$0. 20, CTR=5% Effective CPM = $0. 40*20=$8 Effective CPM = $0. 20*50=$10 14
Ranking Ads 1 Skydive with Us Only one accident last year. Have fun and beat the odds! www. skydivewithus. com Need Skydiving Insurance? We’ve got your back. Even if you lose, you win! www. skydiveinsurance. com 2 CPC=$0. 40, CTR=2% CPC=$0. 20, CTR=5% Effective CPM = $0. 40*20=$8 Effective CPM = $0. 20*50=$10 15
Ad Auction Ranking • $0. 40 and $0. 20 are “bids” per click reflecting the maximum CPC the advertiser is willing to pay • Insurance company could have bid $0. 16001 CPC had e. CPM = $0. 16001*50 = $8. 005 and still gotten ranked #1 • So… we act as if they did: they pay only $0. 16/click, not $0. 20 16
Let Advertisers Bid True Value • The system acts in their best interest • No need to increase their bid when someone else gets ranked ahead of them • When there’s no competition, you pay the minimum • The minimum based on quality of the ad, based on a user-driven assessment 17
Outline • History of Google and Google Ads • Basic advertising • Auction theory • Managing data with My. SQL • Contextually targeted ads – Text-processing & Information Retrieval for fun & profit! • Fun at Google • Questions? 18
Auction Basics • English – “going gone!” • Dutch – price dropped until someone bites • 1 st price sealed – winner pays their bid “Winners curse” “Bid Shading” Complicates selecting a bid 19
Vickrey Auction • 2 nd price sealed – pay 2 nd highest bid • All 4 auctions have the same expected revenue for seller • Vickrey has simplest bidding strategy: Just bid your true value (no bid shading, no winner’s curse) • Won Nobel Memorial Prize in Econ (1996) 20
Engineering challenge: Predicting CTR • Dizzying set of factors could affect clickthrough – Country, time of day, targeted text vs query, … • How does one automatically figure out which factor is more relevant? – How to update model quickly in face of change – How do you estimate CTR for not-yet-shown ads? 21
Statistics collection & reporting • Advertisers want to know how they are doing: – Serving is inherently distributed and global – But advertiser view is inherently global – Want data summarized, sliced and diced along many dimensions – In real time – Hundreds of millions of rows of data per day • Need to have up to date budget information about hundreds of thousands of advertisers in real time while running many thousands of auctions per second 22
Outline • History of Google and Google Ads • Basic advertising • Auction theory • Managing data with My. SQL • Contextually targeted ads – Text-processing & Information Retrieval for fun & profit! • Fun at Google • Questions? 23
So many advertisers, so little Oracle… • We use My. SQL… • and we divide and conquer by partitioning customers onto separate database servers 1 1 2 1 1 . . . 1 Clone to scale to capacity same schema and same data Shard to scale to # advertisers same schema but different data 24 n 1 1
A new set of abstractions for schema… …so we invented a new language! <table name="Ad. Groups" sharding="Audience. Id" change_tracked="true"> <metadata key="dpl: bulk_insert" value="true"/> <column name="Ad. Group. Id"> <metadata key="column_label" value="Ad. Group Id"/> <type><serial/></type> </column> <column name="Audience. Id"> <metadata key="column_label" value="Audience Id"/> <type><integer bytes="4"/></type> </column> <column name="Max. Cpc" default="0"> <description>…</description> 25
Outline • History of Google and Google Ads • Basic advertising • Auction theory • Managing data with My. SQL • Contextually targeted ads – Text-processing & Information Retrieval for fun & profit! • Fun at Google • Questions? 26
Ad. Sense for Content • Contextually-targeted ads • Example: cheese. com – What ads would you show? • Buy some cheese • Look for other kinds of cheese • Recipes? Or diet? 27
How AFC is the Same • Ads are still Ad. Words advertisers • Ranking is still done based on e. CPM • 2 nd Price auction still discounts CPCs paid 28
How it’s Different • Publisher provides the inventory • Ads are selected based on the target URL (e. g. , the content of the page, domain name, etc. ) • We share the revenue with the publisher • Lots more new formats, new media types 29
The billion-dollar Java. Script snippet… <script type="text/javascript"><!-google_ad_width = 728; google_ad_height = 90; google_ad_format = "728 x 90_as"; google_ad_type = "text_image"; //--></script> <script type="text/javascript" src="http: //pagead 2. googlesyndication. com/pagead/show_ads. js"> </script> 30
And how we serve the ads function google_show_ad() { var w = window; w. google_ad_url = 'http: //pagead 2. googlesyndication. com/pagead/ads? ' + '&url=' + escape(w. google_page_url) + '&hl=' + w. google_language; document. write('<ifr' + 'ame' + ' width=' + w. google_ad_width + ' height=' + w. google_ad_height + ' scrolling=no></ifr' + 'ame>'); } google_show_ad(); 31
Many different kinds of ads! • Image ads • Audio ads • Click to call ads • Video ads • …Scratch’n’Sniff ads? 32
Many different models • Cost per click … per impression … per acquisition … per playback • Target by keyword … by site … by vertical • How do we keep things simple for advertisers? 33
Ad. Sense Challenges • Need to know the context of the page fetch the target URL • Need to understand what it’s about natural language processing • Need to find the best matching ads how do you even define “best match”? (a competitor’s ad would be bad!) • Do this all a gazillion times a minute …serving users on every continent …in over 20 languages …for millions and millions of users 34
Basic Computing Cluster Machine N Machine 1 Job 0 task Bigmemory job task GFS Chunkserver Workqueue slave GFS Master 35 … Job 2 task Bigmemory job task GFS Chunkserver Workqueue master Workqueue slave
Outline • History of Google and Google Ads • Basic advertising • Auction theory • Managing data with My. SQL • Contextually targeted ads – Text-processing & Information Retrieval for fun & profit! • Fun at Google • Questions? 36
The Power of Data applied to Contextual Targeting • Conventional wisdom: – Given an order of magnitude increase in computational power… – … you can solve previously impractical problems • Unconventional wisdom – Given an order of magnitude increase in data… – … you can solve previously unsolvable problems! • Consider how to determine similarity between text: – How similar is “Kofi Annan” to “UN Secretary-General”? 37
Traditional Information Retrieval Similarity • Traditionally: Similarity is function of term frequency within a document and across all documents • TF(w) = frequency of term w in a document/query – Intuition: a word appearing more frequently in a text is more likely to be related to its “meaning” • IDF(w) = log [N/nw] + 1 where N = # documents, nw is # documents containing w – Intuition: words appearing in many documents are generally not very informative (e. g. , “the”) • TFIDF: contribution of a term is product of quantities: TFIDF(w) = TF(w) x IDF(w) 38
Using TFIDF to Measure Similarity • Consider each document as a vector: dog Doc. 1 = < Doc. 2 = < Doc. 3 = < 3. 2, 0, 0, compute 0, 2. 1, 1. 7, window. . . 1. 2, 5. 4, 0, . . • Vectors are constructed such that > > > cosine – Each dimension of vector represents a term wi – Each entry of vector has value: TFIDF(wi) – Normalize the vectors to unit length (Euclidean norm) • Similarity of two texts is measured by the cosine between the TFIDF vectors of the documents/queries – Cosine = vector dot product! 39
Determining Similarity of Short Text Snippets • Many queries on the web are short (~2. 5 words) • For short text snippets, cosine is insufficient • Cosine of term vectors for all following text pairs is 0: – “AI” – “Kofi Annan” – “Eric Schmidt” – “NASA” – “Larry Page” “artificial intelligence” “UN Secretary-General” “Google CEO” “space exploration” “Google founder” • Should also identify unrelated concepts, even if high term overlap – “Larry Page” 40 “web page”
Determining Contextual Similarity of Short Text “… the meaning of a word is its use in the language” Ludwig Wittgenstein • For short text snippets, need to determine greater contextual meaning • Insight: leverage huge quantity of web information! • Approach: Expand short text snippet into vector with additional context terms – Find terms that co-occur on web with terms in text snippet to determine contextual vector – Similar to “query expansion” in Information Retrieval 41
Leverage the Web to Determine Similarity • Let x and y be two short text snippets • Want to define a function f(x, y) that measures “semantic” similarity between x and y • Define “query expansion” of text x, QE(x), as follows: – Issue x as query to search engine (oh, say, Google…) – Let R be retrieved set of N documents: {D 1, …, DN} – Compute TFIDF vector Vi for each document Di R – Compute QE(x) as average (centroid) of all vectors Vi • Define f(x, y) = QE(x) QE(y) 42
How Well Does This Work? • Recall previous text pairs: f(x, y) Cosine (“AI”, “artificial intelligence”) 0. 831 0. 000 (“Kofi Annan”, “UN Secretary-General”) 0. 825 0. 000 (“Eric Schmidt”, “Google CEO”) 0. 845 0. 000 (“NASA”, “space exploration”) 0. 691 0. 000 (“Larry Page”, “Google founder”) 0. 770 0. 000 (“Larry Page”, “web page”) 0. 123 0. 500 (“Java island”, “Indonesia”) 0. 454 0. 000 (“Java programming”, “Indonesia”) 0. 020 0. 000 (“Java programming”, “applet development”) 0. 563 0. 000 • Consider multi-faceted term “Java”: 43
44 Ads all hands Who does it all?
Other Google Products 45 Blogger Gmail Google Earth Post your thoughts to the web Web-based email with 2 GB+ storage Satellite imagery of Earth Maps Writely Spreadsheets Find anything anywhere Collaborative word processing Collaborative spreadsheets Calendar Picasa Talk Plan and share calendars Organize pictures and movies Instant messaging and voice-over-IP
Outline • History of Google and Google Ads • Basic advertising • Auction theory • Managing data with My. SQL • Contextually targeted ads – Text-processing & Information Retrieval for fun & profit! • Fun at Google • Questions? 46
Number of Submissions Code submissions into Perforce Yes, we do take weekends off… Time (Minutes of the Week) 47
What it means to be a Googler Flexible work environment Health and wellness benefits Free gourmet meals Convenient on-site services (massage, doctor, dry-cleaning, etc. ) Parental benefits Vacation and time-off Employee network groups 20% time for engineers …and more! 48 * Benefits vary by location
Questions? Come join the fun! http: //google. com/jobs/students Greg J. Badros greg@google. com 49
- Slides: 49