Question Routing in Community Question Answering Putting Category

Slides: 1

Question Routing in Community Question Answering: Putting Category in Its Place Baichuan Li 1, Irwin King 12, and Michael R. Lyu 1 1 The Chinese University of Hong Kong, Shatin, N. T. , Hong Kong 2 AT&T Labs Research, San Francisco, CA, USA {bcli, king, lyu}@cse. cuhk. edu. hk irwin@research. att. com I. Motivations II. Models (cont. ) • Question Routing • Many questions in CQA services are not solved timely • Routing new posted questions to users who are most likely to give answers in a short period of time • Appropriate users: expertise estimation • Language Models (LMs) • Problem • Irrelevant information • For an answerer, a complete set of questions the answerer has answered is utilized in the models • Question category • Screening irrelevant questions of an answerer • Enhancing the efficiency of expertise estimation ii. Transferred Category-Sensitive LM • Disadvantages of BCS-LM • Based on the same-leaf-category assumption, with potential answerers under similar leaf categories being omitted • Answerers with expertise in “Programming & Design” may also be an expert on questions asked in “Software” II. Models Home Entertainment & Music Computers & Internet Software Facebook Music Programming & Design Internet Google Blues Classical Movies Country • Estimating category-category transferring probability • Answerer-based approach • If there are many same answerers posting answers in two categories, these two categories should be similar with each other • Category-answerer matrix E • Each row represents one leaf category • Each column represents one answer • eji: the number of answers ui provided in category cj Figure 1. Part of category hierarchy in Yahoo! Answers i. Basic Category-Sensitive LM iii. Compared Models • Cluster-based LM (Zhou et al. 2009) • Similar questions under same topic are clustered • Each leaf category could be treated as a cluster • Answerer expertise is estimated through calculating answerer’s contribution to each cluster and the similarity between the routed question and each cluster • Mixture of LDA and QLLM (Liu et al. 2010) • latent topics VS explicit categories the leaf category of the new question qr the question texts of all previously answered questions in cj for ui IV. Experiments 0, 6 • Data • Crawled from Computers & Internet and Entertainment & Music categories of Yahoo! Answers 0, 5 Table 1. Description of data set 0, 4 QLLM BCS-LM TCS-LM LDALM CBLM 0, 3 0, 2 0, 1 0 1 • Platform • PC with 2. 4 GHz dual-core CPU, 3 G memory QLLM BCS-LM TCS-LM CBLM LDALM MRR MAP 0. 1460 0. 1070 0. 1893 (↑ 29. 66%) 0. 1424 (↑ 33. 08%) 0. 1965 (↑ 34. 59%) 0. 1469 (↑ 37. 29%) 0. 0031 0. 0024 0. 1695 (↑ 16. 10%) 0. 1281 (↑ 19. 72%) Higher Accuracies + Lower Costs 5 10 20 40 60 80 100 Figure 2. Different methods’ Prec@K in QR versus various Ks Table 2. MRR, MAP and MQRT of various methods Method 3 MQRT 10. 4271 5. 5098 8. 9884 16. 7689 4. 2488 V. Conclusions • An investigation of applying question category to question routing in CQA services • Two simple but efficient category-sensitive LMs were proposed for estimating answerer expertise • Results of experiments have proven that higher accuracies with lower costs are achieved due to the inclusion of question category