Why Searchers Switch Understanding and Predicting Engine Switching

  • Slides: 33
Download presentation
Why Searchers Switch: Understanding and Predicting Engine Switching Rationales Qi Guo (Emory University) Ryen

Why Searchers Switch: Understanding and Predicting Engine Switching Rationales Qi Guo (Emory University) Ryen W. White, Yunqiao Zhang, Blake Anderson, Susan T. Dumais (Microsoft Corporation)

What is Engine Switching? • Voluntary transition from one search engine to another search

What is Engine Switching? • Voluntary transition from one search engine to another search engine – e. g. , Query Google then query Yahoo! or Bing • We focus on within-session switching • Other variants include: – Between-session switching – Long-term switching

Why Searchers Switch? • Dissatisfaction (DSAT) with search results • Topic coverage (i. e.

Why Searchers Switch? • Dissatisfaction (DSAT) with search results • Topic coverage (i. e. , desire to find additional information) • Other (e. g. , user preferences, unintentional)

“DSAT” – Query: [harry potter]

“DSAT” – Query: [harry potter]

“Coverage”– Query: [bellevue wa apartments]

“Coverage”– Query: [bellevue wa apartments]

Other (“Unintentional”) – Query: [facebook]

Other (“Unintentional”) – Query: [facebook]

Motivations • Improve search quality by identifying DSAT switches – Pre-switch engine: improve on

Motivations • Improve search quality by identifying DSAT switches – Pre-switch engine: improve on to retain users – Post-switch engine: perform well on to gain users • Refine competitive metrics – Compare performance against competitors – e. g. , DSAT/Coverage switch rates • Provide additional help in real-time if DSAT switch is predicted

Definitions • A search session is a sequence of activities (e. g. , query,

Definitions • A search session is a sequence of activities (e. g. , query, URL, etc. ) that begins with a query and ends with 30 minutes inactivity • A search engine switching event is a pair of consecutive queries that are issued on different search engines in a session

Research Questions 1. Why do searchers switch search engines? 2. Which behavioral signals are

Research Questions 1. Why do searchers switch search engines? 2. Which behavioral signals are associated with different causes? 3. How accurately can we predict the causes of engine switching?

Reasons for Engine Switching [White and Dumais, CIKM’ 09] • A retrospective questionnaire of

Reasons for Engine Switching [White and Dumais, CIKM’ 09] • A retrospective questionnaire of 488 users – – 57% DSAT 26% Coverage 12% Preferences 5% Other • Drawbacks – Do not always align with actual behavior – Missing corresponding behavioral data

Data Collection: Obtaining Real Rationales with Corresponding Behavioral Data • Lab user study –

Data Collection: Obtaining Real Rationales with Corresponding Behavioral Data • Lab user study – Insufficient data (switching is rare) – Unnatural switching behavior • Human annotation of sample switching sessions – Sufficient data (search log is huge) – Switching rationales are subjective • In-situ assessments – Real switch rationales – Natural switching behavior – Sufficient data (if widely deployed)

In-situ Assessments • Implementation – IE Browser add-on: Switch. Watch – Pop-up dialog when

In-situ Assessments • Implementation – IE Browser add-on: Switch. Watch – Pop-up dialog when switch – Record URLs, timestamps, tab focus/blur, etc. – Help maintain privacy • Exclude https URLs • Anonymization • Deployment – Participants: 216 Microsoft employees (2200 invited, 10% response rate) – 4 weeks (50 USD lottery gift card / week)

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch timestamps • Search Goal changes – Exactly the same – Related – Not the same • Switch Reasons – – – Dissatisfaction Verification/Coverage Unintentional Better for this type Preference Other • Ignore Button

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch timestamps • Search Goal changes – Exactly the same – Related – Not the same • Switch Reasons – – – Dissatisfaction Verification/Coverage Unintentional Better for this type Preference Other • Ignore Button

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch timestamps • Search Goal changes – Exactly the same – Related – Not the same • Switch Reasons – – – Dissatisfaction Verification/Coverage Unintentional Better for this type Preference Other • Ignore Button

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch timestamps • Search Goal changes – Exactly the same – Related – Not the same • Switch Reasons – – – Dissatisfaction Verification/Coverage Unintentional Better for this type Preference Other • Ignore Button

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch

Pop-up Dialog • Information about Switch – Pre-/Post-switch queries – Pre-/Post-switch engines – Pre-/Post-switch timestamps • Search Goal changes – Exactly the same – Related – Not the same • Switch Reasons – – – Dissatisfaction Verification/Coverage Unintentional Better for this type Preference Other • Ignore Button

Overview of Behavioral Data • 20, 554 queries • 1, 004 switches (excluded 25

Overview of Behavioral Data • 20, 554 queries • 1, 004 switches (excluded 25 test-suggestive queries, e. g. , ‘test’, ‘hello world’) • 562 (56%) received in-situ assessments • 4. 2% of the queries followed by a switch • 107 (49. 5%) of the 216 users switched at least once

RQ 1. Why do searchers switch search engines?

RQ 1. Why do searchers switch search engines?

RQ 1. Why do searchers switch search engines? • Why do searchers change search

RQ 1. Why do searchers switch search engines? • Why do searchers change search queries? • What is the breakdown of different causes? • How does the change of search queries influence the breakdown?

Query Change During Switching • Observation: high % query changes during switching • Definitions

Query Change During Switching • Observation: high % query changes during switching • Definitions & Breakdown – 32 % Same Query (SQ): Identical queries – 18% Related Queries (RQ): share at least one query term that is not a stop word, but are not SQ – 50% Different Queries (DQ): no common (non-stopword) terms • Why do searchers change query? – What does query change suggest about goal change? – Is it associated with switching causes?

Query Change and Goal Change Query change Goal change [% judged (non-ignored) queries] Ignored

Query Change and Goal Change Query change Goal change [% judged (non-ignored) queries] Ignored Same Goal Related Goals Different Goals All 65% 9% 25% 45% Same Query 98% 1% 1% 27% Related Queries 77% 20% 3% 39% Different Queries 23% 12% 65% 60% • Influence on Goal Change – SQ almost always suggests Same Goal (SG) – RQ most of the time suggests Same Goal (SG) – DQ most of the time suggests Different Goals (DG) • Influence on Providing Feedback – The more related the queries/goals, the more likely the users are to provide feedback in the dialog – Annoying to interrupt if the search goal is changed

Query Change & Switch Causes 100% 90% 80% 70% 60% 50% 40% 30% 20%

Query Change & Switch Causes 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Dissatisfaction Coverage Preferences Other %SQ %RQ %DQ %All %Retrospective • %All is very different from %Retrospective – Lower Dissatisfaction, Coverage; higher Preferences, Other • %DQ is the most different – contains mostly Preferences and Other (e. g. , unintentional) • %SQ is the most similar – when asked to recount their last switching event, people

Focus on SQ switches • Greatest fraction of intentional causes (e. g. , dissatisfaction/coverage)

Focus on SQ switches • Greatest fraction of intentional causes (e. g. , dissatisfaction/coverage) • DSAT switches most valuable to search provider – Pre-switch engine: improve on to retain users – Post-switch engine: perform well on to gain users • Direct comparisons between original and destination engines

RQ 2. Which behavioral signals are associated with different causes?

RQ 2. Which behavioral signals are associated with different causes?

User Behaviors Associated with Switches • Query – Length in words/characters – Time difference

User Behaviors Associated with Switches • Query – Length in words/characters – Time difference between pre- and post-switch queries • Pre-/Post-switch Behavior – Num. of queries – Num. and rate of unique queries – Num. and rate of reformulations – Num. and rate of clicks – Num. and rate of SAT-clicks (dwell >= 30 seconds) – Num. and rate of Bounce Clicks (dwell < 15 seconds) – Num. of pages on clicked trails – Num. of clicks on results with query terms in title

User Behaviors Associated with Switches • Query – Length in words/characters – Time difference

User Behaviors Associated with Switches • Query – Length in words/characters – Time difference between pre- and post-switch queries • Pre-/Post-switch Behavior – Num. of queries – Num. and rate of unique queries – Num. and rate of reformulations – Num. and rate of clicks – Num. and rate of SAT-clicks (dwell >= 30 seconds) – Num. and rate of Bounce Clicks (dwell < 15 seconds) – Num. of pages on clicked trails – Num. of clicks on results with query terms in title NOTE: Colors = Significant differences in DSAT vs. Other and Coverage vs. Other

RQ 3. How accurately can we predict the causes of engine switching?

RQ 3. How accurately can we predict the causes of engine switching?

Predicting the Switching Causes • Binary predictions: one-vs-all classifier for each rationale • Classifier:

Predicting the Switching Causes • Binary predictions: one-vs-all classifier for each rationale • Classifier: logistic regression • Metric: F 0. 5 – twice weight on Precision than Recall – Offline: search logs are huge – Online: only intervene users when confidence is high • Methods compared – Baseline (Prior) – Baseline (Rule) • DSAT: no pre-switch click and one or more clicks after the switch • Coverage: both pre- and post-switch clicks exist • Other: neither of the above rules are triggered – Classifier with All / Query / Pre-switch / Post-switch features • Data: 354 in-situ SQ switches • Experiments: 10 -fold Cross Validation

Prediction Results Baseline (Prior): p <. 05, p <. 01; Baseline (Rule): p <.

Prediction Results Baseline (Prior): p <. 05, p <. 01; Baseline (Rule): p <. 05, p <. 01 Method DSAT Coverage Other Base (Prior) 72. 40 27. 12 17. 40 Base (Rule) 48. 84 24. 19 20. 20 All Features 85. 69 47. 84 29. 01 • All features – Significantly outperforms both baselines in predicting DSAT and Coverage – Marginally outperforms baselines in predicting Other • Predicting Other is more Challenging – Requires information about user profiles (e. g. , preferred search engine, browser setting)

Features Importance (DSAT) Base (Prior) Base (Rule) All Query Pre-switch Post-switch 72. 40 48.

Features Importance (DSAT) Base (Prior) Base (Rule) All Query Pre-switch Post-switch 72. 40 48. 84 85. 69 74. 28 81. 12 78. 99 • All the groups outperform baselines (p <. 05) • Pre-switch features > post-switch features – Pre-switch interaction reveals more about switching rational – More variance in what users do following a switch • Most important features – – – Num. Pre-switch clicks(+) Pre-switch unique query rate (+) Num. Post-switch clicks on URLs that contain query terms (+) Pre-switch SAT click rate (−) Post-switch query reformulation rate (−)

Summary • Demonstrates the feasibility of studying engine switching rationales using in-situ assessments and

Summary • Demonstrates the feasibility of studying engine switching rationales using in-situ assessments and client-side instrumentation • Provides insights about how search goal changes and user behavior are associated with switch rationales • Develops models to accurately predict switching rationales using behavioral features which enables various important applications

Thank you! • Thanks to SIGIR for travel grant support!

Thank you! • Thanks to SIGIR for travel grant support!