Booming Up the Long Tails Discovering Potentially Contributive

Booming Up the Long Tails: Discovering Potentially Contributive Users in Community-Based Question Answering Services Jae-Gil Lee Department of Knowledge Service Engineering KAIST

Contents l Background and Motivation l Overview of the Methodology l Detailed Methodology l Experiment Evaluation l Conclusions This paper received the Best Paper Award at AAAI ICWSM-13 2/12/2014 2

Community-Based Question Answering (CQA) Services CQA services Ask Answer 50, 000 questions per day 160, 000 questions per day Search engines are weak at § Recent updated information § Personalized information § Advice & opinion [Budalakoti et al. 2010] Current problems in CQA services Too many questions Hard to find questions to answer Solutions: expert finding, question routing [Zhou et al. 2009] 2/12/2014 3

Question Routing Graph-based HITs, Page. Rank Find influential answerers Content-based Language Modeling Match questions & answerers Profile-based User profile Find experts based on profiles Also, hybrid methods Two important factors in question routing • Expertise: answerers need proper knowledge on the question area • Availability: answerers need time to answer [Horowitz et al. 2010, Li et al. 2010, Zhang et al. 2007] There is a trade-off between expertise and availability 2/12/2014 4

Short Tail vs. Long Tail l Most contributions (i. e. , answers) in CQA services are made by a small number of heavy users l Many questions won’t be answered if such heavy users become unavailable A system is not robust if it heavily relies on a small number of users 2/12/2014 5

l On the other hand, recently-joined users are prone to leave CQA services l Example: the appearances of the 9, 874 answerers who wrote answers in the computers category of Ki. N Only 8. 4% of answerers remained after a year 2/12/2014 6

Comparison with Traditional Question Routing l Motivating such recently-joined users to become heavy users―by routing proper questions to them so that they can easily contribute―is of prime importance towards the success of the services Existing methodologies C Our methodology D Which users should we take care of? Recently-joined expert users! 2/12/2014 7

Problem Setting l Developing a methodology of measuring the likelihood of a light user becoming a contributive (i. e. , heavy) user in the future in CQA services l Input: (i) the statistics of each heavy user, (ii) the answers written by heavy users, (iii) the answers written by light users l Output: the likelihood of each light user becoming a heavy user in the future Answer Affordance 2/12/2014 8

Contents l Background and Motivation l Overview of the Methodology l Detailed Methodology l Experiment Evaluation l Conclusions 2/12/2014 9

Challenges l There is no sufficient information (i. e. , answers) to judge the expertise of recently-joined users! Kind of a cold-start problem l How can we cope with the lack of information? 2/12/2014 10

Intuition l A person’s active vocabulary reveals his/her knowledge l Vocabulary has sharable characteristics so that domain- specific words are repeatedly used by expert answerers Using the active vocabulary of a user to infer his/her expertise, i. e. , using the vocabulary to bridge a gap between heavy users and light users 2/12/2014 11

Vocabulary Level Details l Vocabulary knowledge l “Vocabulary knowledge should at least comprise two dimensions, which are vocabulary breadth (or size), and depth (or quality)” [Marjorie et al. 1996] l Three dimensions of lexical competence l “(a) partial to precise knowledge, (b) depth of knowledge, and (c) receptive to productive use ability” [Henriksen 1999] l Productive vocabulary ability l “It implies degrees of knowledge. A learner may be reluctant to use infrequent word using a simpler, more frequent word of a similar meaning. Such reluctance is often a result of uncertainty about the word’s usage. Lack of confidence is a reflection of imperfect knowledge. We refer to the ability to use a word at one’s free will as free productive ability” [Laufer et al. 1999] 2/12/2014 12

Domain Experts’ Vocabulary Usage Details l “Experts generated queries containing words from domain-specific lexicons fifty percent more often than non-experts. In addition to being able to generate more technically-sophisticated queries, experts also generated longer queries in terms of tokens and characters. It may be that because domain experts are more familiar with the domain vocabulary. ” [White et al. 2009] l “Behavior of software engineers is quite distinct from general web search behavior. They use longer and more detailed queries. They make heavy use of specialized terms and search syntax. … Controlled vocabulary look-up lists or query processing tools should be in place to deal with acronyms, product names, and other technical terms” [Freund et al. 2006] l “When searching, experts found slightly more relevant documents. Experts issued more queries per task and longer queries, and their vocabulary overlapped somewhat more with thesaurus entries” [Zhang et al. 2005] Domain experts use specialized, but formatted/standardized words 2/12/2014 13

Domain Expert’s Vocabulary Durability Details l “One important change in behavior was the use of a more specific vocabulary as students learned more about their research topic” [Vakkari et al. 2003] l “Experts’ use of domain-specific vocabulary changes only slightly over the duration of the study. However, many non -expert users exhibit an increase in their usage of domainspecific vocabulary” [White et al. 2009] Domain expert’s unique word set remains for a long time without change 2/12/2014 14

Usage of the Vocabulary: Overview Heavy Users 2/12/2014 Words Light Users 15

Contents l Background and Motivation l Overview of the Methodology l Detailed Methodology l Experiment Evaluation l Conclusions 2/12/2014 16

Basics of CQA Services l Top-Level Categories (e. g. , Computers, Travel) l Defining the expertise of a user on a top-level category in our methodology l User Profile l Selection Count = A l Selection Ratio = B = A/D l Recommendation Count = C 2/12/2014 17

Answer Affordance l Considering both expertise and availability 2/12/2014 18

Estimated Expertise Step 1 Step 2 Step 3 Step 4 Estimated Expertise(un+1) Expertise(u 1) u 1 w 1, w 2, w 4, w 6 … Vocabulary Expertise(u 2) u 2 . . . w 1, w 3, w 6, w 8 … Word. Level(wn) un+1 Estimated Expertise(un+2) w 3, w 4, w 5, w 8 … Word. Level(w 3) Expertise(un) w 1, w 3, w 4, w 5 … Word. Level(w 2) w 2, w 4, w 6, w 7 … . . . un Word. Level(w 1) . . . w 2, w 3, w 6, w 8 … un+2 Estimated Expertise(un+k) un+k Word. Level(wi ) Heavy Users UH 2/12/2014 Light Users UL 19

l Step 1: the expert score of a heavy user is calculated using the abundant historical data Expertise(uh) l The expertise of a user becomes higher (i) as the user’s answers are more concentrated on the target category and (ii) as the user has higher selection count, selection ratio, and recommendation count 2/12/2014 20

l Step 2: the level of a word is determined by the expert scores of the heavy users who used the word before Word. Level(wi) l The word level of a word becomes higher as the word is used by more expert users and more frequently l Decomposing an answer into words is reliable even for a small number of answers, because each answer typically has quite a few words 2/12/2014 21

l Step 3: these word levels are propagated to a set of words used by a light user in his/her answers l This step is supported by the observation that the vocabulary of an expert stays mostly unchanged despite a temporal gap [White, Dumais, and Teevan 2009] 2/12/2014 22

l Example: sample words in the travel category with their value of Word. Level(Wi) 2/12/2014 23

l Step 4: the expert score of the light user is reversely calculated based on his/her vocabulary Estimated. Expertise(ul) 2/12/2014 24

Availability l Simply measuring the number of a user’s answers with their importance proportional to their recency 2/12/2014 25

Contents l Background and Motivation l Overview of the Methodology l Detailed Methodology l Experiment Evaluation l Conclusions 2/12/2014 26

Data Set l Collected from Naver Knowledge-In (Ki. N) l http: //kin. naver. com l Ranging from September 2002 to August 2012 Ten years l Including two categories: Computers and Travel l Computers factual information, Travel subjective opinions l The entropy is used for measuring the expertise of a user, working well especially for the categories where factual expertise is primarily sought after [Adamic et al. 2008] l Statistics 2/12/2014 Computers Travel # of answers 3, 926, 794 585, 316 # of words 191, 502 232, 076 # of users 228, 369 44, 866 27

Period Division l Dividing the 10 year period into three periods l The resource period is sufficiently long to learn the expertise of users, so is the test period; in contrast, the training period is not l Heavy users: those who joined during the resource period l Light users: those who joined during the training period (only one year) l Assuming that the end of the training period is the present 2/12/2014 28

Accuracy of Expertise Prediction: Preliminary Tests l Extracting the main interest declared by each user in CQA services l Measuring the ratio of such self-declared experts on the target category among the top-k light users sorted by Estimated. Expertise() (a) Computers (b) Travel The ratio of users who expressed their interests 2/12/2014 29

Accuracy of Expertise Prediction: Evaluation Method l Finding the top-k users by Estimated. Expertise() from the training period our prediction l Finding the top-k users by Ki. N’s ranking scheme from the test period ground truth l Ki. N’s ranking scheme is a weighted sum of the selection count and the selection ratio l Measuring (i) P@k and (ii) R-precision l Repeating the same procedure for comparison with the following approaches l Expertise(): the way of ranking heavy users rather than light users in our methodology l Sel. Count(): the selection count l Recomm. Count(): the recommendation count 2/12/2014 30

Accuracy of Expertise Prediction: Results The precision performance for the computers category The precision performance for the travel category 2/12/2014 31

Accuracy of Answer Affordance: Evaluation Method l Finding the top-k users by Affordance() for light users our methodology l Finding the top-k users managed by Ki. N competitor l Measuring the user availability and the answer possession for the next one month l User availability: the ratio of the number of the top-k users who appeared on the day to the total number of users who appeared on that day l Answer possession: the ratio of the number of the answers posted by the top-k users on the day to the total number of answers posted on that day 2/12/2014 32

(a) Computers (b) Travel The result of the user availability (a) Computers (b) Travel The result of the answer possession 2/12/2014 33

Contents l Background and Motivation l Overview of the Methodology l Detailed Methodology l Experiment Evaluation l Conclusions 2/12/2014 34

Conclusions l Developed a new methodology that can make CQA services more active and robust l Verified the effectiveness of our methodology using a real data set for ten years Quote from the reviews: “I'm sold. If these results hold on another CQA site, this will be a very significant contribution to online communities. The study is well done, it's incredibly readable and clear, and the evaluation dataset is impeccable (10 years of data from one of the top 3 sites). ” 2/12/2014 35

Thank You! 2/12/2014 36