Research on Intelligent Text Information Management Cheng Xiang

  • Slides: 68
Download presentation
Research on Intelligent Text Information Management Cheng. Xiang Zhai Department of Computer Science Graduate

Research on Intelligent Text Information Management Cheng. Xiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology Statistics University of Illinois, Urbana-Champaign http: //www-faculty. cs. uiuc. edu/~czhai, [email protected] uiuc. edu Contains joint work with Xuehua Shen, Bin Tan, Qiaozhu Mei, Yue Lu, and other members of the TIMan group , © 2009 Cheng. Xiang Zhai 1

Research Roadmap Web, Email, and Bioinformatics Search Applications Summarization Filtering Information Access Search Mining

Research Roadmap Web, Email, and Bioinformatics Search Applications Summarization Filtering Information Access Search Mining Applications Mining Information Organization Categorization Current focus Visualization Extraction Knowledge Acquisition Clustering Natural Language Content Analysis Current focus - Personalized -Comparative text mining Text - Retrieval models -Opinion integration -Controversy discovery - Topic map Entity/Relation Extraction - Recommender 2

Sample Projects • User-Centered Adaptive Information Retrieval • Multi-Resolution Topic Map for Browsing •

Sample Projects • User-Centered Adaptive Information Retrieval • Multi-Resolution Topic Map for Browsing • Comparative Text Mining • Opinion Integration and summarization 3

Project 1: User-Centered Adaptive IR (UCAIR) • A novel retrieval strategy emphasizing – user

Project 1: User-Centered Adaptive IR (UCAIR) • A novel retrieval strategy emphasizing – user modeling (“user-centered”) – search context modeling (“adaptive”) – interactive retrieval • Implemented as a personalized search agent that – sits on the client-side (owned by the user) – integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) – collaborates with each other – goes beyond search toward task support 4

Non-Optimality of Document-Centered Search Engines Query = Jaguar As of Oct. 17, 2005 Car

Non-Optimality of Document-Centered Search Engines Query = Jaguar As of Oct. 17, 2005 Car Software Mixed results, unlikely optimal for any particular user Car Animal Car 5

The UCAIR Project (NSF CAREER) WEB Viewed Web pages Search Engine Desktop Files .

The UCAIR Project (NSF CAREER) WEB Viewed Web pages Search Engine Desktop Files . . . Email Query History Search Engine Personalized search agent “jaguar” 6

Potential Benefit of Personalization Suppose we know: Car 1. Previous query = “racing cars”

Potential Benefit of Personalization Suppose we know: Car 1. Previous query = “racing cars” vs. “Apple OS” Car Software Car 2. “car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days 3. User just viewed an “Apple OS” document Animal Car 7

Intelligent Re-ranking of Unseen Results When a user clicks on the “back” button after

Intelligent Re-ranking of Unseen Results When a user clicks on the “back” button after viewing a document, UCAIR reranks unseen results to pull up documents similar to the one the user has viewed 8

UCAIR Outperforms Google [Shen et al. 05] Precision at N documents Ranking Method prec@5

UCAIR Outperforms Google [Shen et al. 05] Precision at N documents Ranking Method [email protected] [email protected] [email protected] [email protected] Google UCAIR 0. 538 0. 581 0. 472 0. 556 0. 377 0. 453 0. 308 0. 375 17. 8% 20. 2% 21. 8% Improvement 8. 0% UCAIR toolbar available at http: //sifaka. cs. uiuc. edu/ir/ucair/ 9

Future: Personal Information Agent WWW Desktop Intranet Email IM User Profile Active Info Service

Future: Personal Information Agent WWW Desktop Intranet Email IM User Profile Active Info Service Security Handler Blog E-COM Task Support … Personal Content Index Frequently Accessed Info Sports … Literature 10

Ongoing Work • UCAIR system • Recommendation and advertising on social networks 11

Ongoing Work • UCAIR system • Recommendation and advertising on social networks 11

Project 2: Multi-Resolution Topic Map for Browsing • Promoting browsing as a “first-class citizen”

Project 2: Multi-Resolution Topic Map for Browsing • Promoting browsing as a “first-class citizen” • Multi-resolution topic map for browsing – Enable a user to find information through navigation – Very useful when a user can’t formulate effective queries or uses a small screen device • Search log as information footprints – Organize search log into a topic map – Allow a user to follow information footprints of previous users – Enable social surfing 2009 © Cheng. Xiang Zhai 12

Querying vs. Browsing 13

Querying vs. Browsing 13

Information Seeking as Sightseeing • Know the address of an attraction site? – Yes:

Information Seeking as Sightseeing • Know the address of an attraction site? – Yes: take a taxi and go directly to the site – No: walk around or take a taxi to a nearby place then walk around • Know what exactly you want to find? – Yes: use the right keywords as a query and find the information directly – No: browse the information space or start with a rough query and then browse When query fails, browsing comes to rescue… 14

Current Support for Browsing is Limited • Hyperlinks – Only page-to-page Beyond hyperlinks? –

Current Support for Browsing is Limited • Hyperlinks – Only page-to-page Beyond hyperlinks? – Mostly manually constructed – Browsing step is very small • Web directories – Manually constructed ODP Beyond fixed – Only support vertical navigation categories? – Fixed categories How to promote browsing as a “first-class 15 citizen”?

Sightseeing Analogy Continues… Horizontal navigation Region Zoom in Zoom out 16

Sightseeing Analogy Continues… Horizontal navigation Region Zoom in Zoom out 16

Topic Map for Touring Information Space Zoom in Multiple resolutions Topic regions 0. 03

Topic Map for Touring Information Space Zoom in Multiple resolutions Topic regions 0. 03 0. 02 0. 05 0. 03 0. 01 Horizontal navigation Zoom out 17

Topic-Map based Browsing Demo 18

Topic-Map based Browsing Demo 18

How can we construct such a multi -resolution topic map? Multiple possibilities… 19

How can we construct such a multi -resolution topic map? Multiple possibilities… 19

Search Logs as Information Footprints in information space User 2722 searched for "national car

Search Logs as Information Footprints in information space User 2722 searched for "national car rental" [!] at 2006 -03 -09 11: 24: 29 User 2722 searched for "military car rental benefits" [!] at 2006 -03 -10 09: 33: 37 (found http: //www. valoans. com) User 2722 searched for "military car rental benefits" [!] at 2006 -03 -10 09: 33: 37 (found http: //benefits. military. com) User 2722 searched for "military car rental benefits" [!] at 2006 -03 -10 09: 33: 37 (found http: //www. avis. com) User 2722 searched for "enterprise rent a car" [!] at 2006 -0405 23: 37: 42 (found http: //www. enterprise. com) User 2722 searched for "meineke care center" [!] at 200605 -02 09: 12: 49 (found http: //www. meineke. com) User 2722 searched for "car rental" [!] at 2006 -05 -25 15: 54: 36 User 2722 searched for "autosave car rental" [!] at 2006 -0525 23: 26: 54 (found http: //eautosave. com) User 2722 searched for "budget car rental" [!] at 2006 -05 -25 23: 29: 53 User 2722 searched for "alamo car rental" [!] at 2006 -05 -25 23: 56: 13 …… 20

Information Footprints Topic Map • Challenges – How to define/construct a topic region –

Information Footprints Topic Map • Challenges – How to define/construct a topic region – How to control granularities/resolutions of topic regions – How to connect topic regions to support effective browsing • Two approaches – Multi-granularity clustering – Query editing 21

Collaborative Surfing Navigation trace enriches map structures New queries become new footprints Clickthroughs become

Collaborative Surfing Navigation trace enriches map structures New queries become new footprints Clickthroughs become new footprint Browse logs offer more opportunities to understand user interests and intents 22

Project 3: Comparative Text Mining • Documents are often associated with context (metadata) –

Project 3: Comparative Text Mining • Documents are often associated with context (metadata) – Direct context: time, location, source, authors, … – Indirect context: events, policies, … • Many applications require “contextual text analysis”: – Discovering topics from text in a context-sensitive way – Analyzing variations of topics over different contexts – Revealing interesting patterns (e. g. , topic evolution, topic variations, topic communities) 23

Example 1: Comparing News Articles Vietnam War CNN Afghan War Fox Before 9/11 During

Example 1: Comparing News Articles Vietnam War CNN Afghan War Fox Before 9/11 During Iraq war US blog European blog Iraq War Blog Current Others Common Themes “Vietnam” specific “Afghan” specific “Iraq” specific United nations … … … Death of people … … … … What’s in common? What’s unique? 24

More Contextual Analysis Questions • What positive/negative aspects did people say about X (e.

More Contextual Analysis Questions • What positive/negative aspects did people say about X (e. g. , a person, an event)? Trends? • How does an opinion/topic evolves over time? • What are emerging topics? What topics are fading away? • How can we characterize a social network? 25

Research Questions • Can we model all these problems generally? • Can we solve

Research Questions • Can we model all these problems generally? • Can we solve these problems with a unified approach? • How can we bring human into the loop? 26

Contextual Probabilistic Latent Semantics Analysis ([KDD 2006]…) Themes View 1 View 2 View 3

Contextual Probabilistic Latent Semantics Analysis ([KDD 2006]…) Themes View 1 View 2 View 3 government Choose a theme Criticism of government Draw a word from i response togovernment the hurricane government 0. 3 primarily consisted of response 0. 2. . Document response criticism of its response context: to … The total shut-in oil Time = from July the 2005 production Gulf Location = Texas of Mexico …donate approximately = xxx 24% of. Author the annual help aid production the shut. Occup. = and Sociologist in gas Over Ageproduction Group = … 45+ seventy countries pledged …Orleans monetary donations or new other assistance. … donate 0. 1 relief 0. 05 help 0. 02. . donation city 0. 2 new 0. 1 orleans 0. 05. . New Orleans Texas July 2005 sociolo gist Choose a view Theme coverages : …… Texas July 2005 document Choose a Coverage 27

Comparing News Articles Iraq War (30 articles) vs. Afghan War (26 articles) The common

Comparing News Articles Iraq War (30 articles) vs. Afghan War (26 articles) The common theme indicates that “United Nations” is involved in both wars Cluster 1 Common Theme Iraq Theme Afghan Theme united nations … Cluster 2 0. 04 n 0. 03 Weapons 0. 024 Inspections 0. 023 … Northern 0. 04 alliance 0. 04 kabul 0. 03 taleban 0. 025 aid 0. 02 … killed month deaths … troops hoon sanches … taleban rumsfeld hotel front … Cluster 3 0. 035 0. 032 0. 023 … 0. 016 0. 015 0. 012 … 0. 026 0. 02 0. 011 … Collection-specific themes indicate different roles of “United Nations” in the two wars 28

Spatiotemporal Patterns in Blog Articles • • • Query= “Hurricane Katrina” Topics in the

Spatiotemporal Patterns in Blog Articles • • • Query= “Hurricane Katrina” Topics in the results: Spatiotemporal patterns 29

Theme Life Cycles (“Hurricane Katrina”) Oil Price New Orleans price 0. 0772 oil 0.

Theme Life Cycles (“Hurricane Katrina”) Oil Price New Orleans price 0. 0772 oil 0. 0643 gas 0. 0454 increase 0. 0210 product 0. 0203 fuel 0. 0188 company 0. 0182 … city 0. 0634 orleans 0. 0541 new 0. 0342 louisiana 0. 0235 flood 0. 0227 evacuate 0. 0211 storm 0. 0177 … 30

Theme Snapshots (“Hurricane Katrina”) Week 2: The discussion moves towards the north and west

Theme Snapshots (“Hurricane Katrina”) Week 2: The discussion moves towards the north and west Week 1: The theme is the strongest along the Gulf of Mexico Week 3: The theme distributes more uniformly over the states Week 4: The theme is again strong along the east coast and the Gulf of Mexico Week 5: The theme fades out in most states 31

Theme Life Cycles (KDD Papers) gene 0. 0173 expressions 0. 0096 probability 0. 0081

Theme Life Cycles (KDD Papers) gene 0. 0173 expressions 0. 0096 probability 0. 0081 microarray 0. 0038 … marketing 0. 0087 customer 0. 0086 model 0. 0079 business 0. 0048 … rules 0. 0142 association 0. 0064 support 0. 0053 … 32

Theme Evolution Graph: KDD 1999 2000 2001 2002 SVM 0. 007 criteria 0. 007

Theme Evolution Graph: KDD 1999 2000 2001 2002 SVM 0. 007 criteria 0. 007 classifica – tion 0. 006 linear 0. 005 … decision 0. 006 tree 0. 006 classifier 0. 005 class 0. 005 Bayes 0. 005 … web 0. 009 classifica – tion 0. 007 features 0. 006 topic 0. 005 … 2003 mixture 0. 005 random 0. 006 clustering 0. 005 variables 0. 005 … … Classifica - tion text unlabeled document labeled learning … 0. 015 0. 013 0. 012 0. 008 0. 007 … Informa - tion 0. 012 web 0. 010 social 0. 008 retrieval 0. 007 distance 0. 005 networks 0. 004 … 2004 T topic 0. 010 mixture 0. 008 LDA 0. 006 semantic 0. 005 … 33

Multi-Faceted Sentiment Summary (query=“Da Vinci Code”) Facet 1: Movie Facet 2: Book Neutral Positive

Multi-Faceted Sentiment Summary (query=“Da Vinci Code”) Facet 1: Movie Facet 2: Book Neutral Positive Negative . . . Ron Howards selection of Tom Hanks to play Robert Langdon. Tom Hanks stars in the movie, who can be mad at that? But the movie might get delayed, and even killed off if he loses. Directed by: Ron Howard Writing credits: Akiva Goldsman. . . Tom Hanks, who is my favorite movie star act the leading role. protesting. . . will lose your faith by. . . watching the movie. After watching the movie I went online and some research on. . . Anybody is interested in it? . . . so sick of people making such a big deal about a FICTION book and movie. I remembered when i first read the book, I finished the book in two days. Awesome book. . so sick of people making such a big deal about a FICTION book and movie. I’m reading “Da Vinci Code” now. So still a good book to past time. This controversy book cause lots conflict in west society. … 34

Separate Theme Sentiment Dynamics “book” “religious beliefs” 35

Separate Theme Sentiment Dynamics “book” “religious beliefs” 35

Event Impact Analysis: IR Research Theme: retrieval models term 0. 1599 relevance 0. 0752

Event Impact Analysis: IR Research Theme: retrieval models term 0. 1599 relevance 0. 0752 weight 0. 0660 feedback 0. 0372 independence 0. 0311 model 0. 0310 frequent 0. 0233 probabilistic 0. 0188 document 0. 0173 … vector concept extend model space boolean function feedback … xml email model collect judgment rank subtopic … 0. 0514 0. 0298 0. 0297 0. 0291 0. 0236 0. 0151 0. 0123 0. 0077 1992 0. 0678 0. 0197 0. 0191 0. 0187 0. 0102 0. 0097 0. 0079 SIGIR papers Publication of the paper “A language modeling approach to information retrieval” Starting of the TREC conferences probabilist 0. 0778 model 0. 0432 logic 0. 0404 ir 0. 0338 boolean 0. 0281 algebra 0. 0200 estimate 0. 0119 weight 0. 0111 … year 1998 model 0. 1687 language 0. 0753 estimate 0. 0520 parameter 0. 0281 distribution 0. 0268 probable 0. 0205 smooth 0. 0198 markov 0. 0137 likelihood 0. 0059 … 36

Topic Modeling + Social Networks Authors writing about the same topic form a community

Topic Modeling + Social Networks Authors writing about the same topic form a community Separation of 3 research communities: IR, ML, Web Topic Model Only Topic Model + Social Network 37 37

On-Going Work • • • Combining contextual text analysis with visualization More detailed semantic

On-Going Work • • • Combining contextual text analysis with visualization More detailed semantic modeling (entities, relations, …) Integration of search and contextual text analysis to develop an analyst’s workbench: – Interactive semantic navigation and probing – Synthesis of information/knowledge – Personalized/customized service 38

Project 4: Opinion Integration and Summarization • Increasing popularity of Web 2. 0 applications

Project 4: Opinion Integration and Summarization • Increasing popularity of Web 2. 0 applications – more people express opinions on the Web How to digest all? 190, 451 posts 4, 773, 658 results 39

Motivation: Two kinds of opinions 190, 451 posts 4, 773, 658 results How to

Motivation: Two kinds of opinions 190, 451 posts 4, 773, 658 results How to benefit from both? Expert opinions • CNET editor’s review • Wikipedia article Ordinary opinions • Forum discussions • Blog articles • Well-structured • Easy to access • Maybe biased • Outdated soon • Represent the majority • Up to date • Hard to access • fragmental 40

Problem Definition Input Topic: i. Pod Expert review with aspects Text collection of ordinary

Problem Definition Input Topic: i. Pod Expert review with aspects Text collection of ordinary opinions, e. g. Weblogs Design Battery Price. . Extra Aspects Review Aspects Output Design Battery Price i. Tunes warranty Similar opinions cute… tiny… last many hrs Supplementary opinions. . thicker. . die out soon could afford still it expensive … easy to use… …better to extend. . Integrated Summary 41

Methods • Semi-Supervised Probabilistic Latent Semantic Analysis (PLSA) – The aspects extracted from expert

Methods • Semi-Supervised Probabilistic Latent Semantic Analysis (PLSA) – The aspects extracted from expert reviews serve as clues to define a conjugate prior on topics – Maximum a Posteriori (MAP) estimation – Repeated applications of PLSA to integrate and align opinions in blog articles to expert review

Results: Product (i. Phone) • Opinion Integration with review aspects Review article Similar opinions

Results: Product (i. Phone) • Opinion Integration with review aspects Review article Similar opinions You can make N/A emergency calls, but you can't use any other functions… Confirm the Activation opinions from the review will Feature rated battery life of 8 i. Phone hours talk time, 24 Up to 8 Hours of Talk hours of music Time, 6 Hours of playback, 7 hours of Internet Use, 7 Hours video playback, and 6 of Video Playback or hours on Internet use. 24 Hours of Audio Playback Battery Supplementary opinions … methods for unlocking the i. Phone have emerged on the Unlock/hack Internet in the past few weeks, i. Phone they involve tinkering although with the i. Phone hardware… Playing relatively high bitrate VGA H. 264 videos, our i. Phone lasted almost exactly 9 freaking hours of continuous playback with cell and Wi. Fi on (but Bluetooth off). Additional info under real usage 43

Results: Product (i. Phone) • Opinions on extra aspects support Supplementary opinions on extra

Results: Product (i. Phone) • Opinions on extra aspects support Supplementary opinions on extra aspects 15 You may have heard of i. ASign … an i. Phone Dev Wiki tool that Another way to allows you to activate your phone without going through the activate i. Phone i. Tunes rigamarole. 13 Cisco has owned the trademark on the name "i. Phone" since 2000, when it acquired Info. Geari. Phone Technology Corp. , which trademark originally registered the name. originally owned by 13 Cisco With the imminent availability of Apple's uber cool i. Phone, a look at 10 things current smartphones like the Nokia N 95 have choiceand for that the i. Phone can't currently been able to. Adobetter for a while smart phones? match. . . 44

Results: Product (i. Phone) • Support statistics for review aspects People care about price

Results: Product (i. Phone) • Support statistics for review aspects People care about price Controversy: activation requires contract with AT&T People comment a lot about the unique wi-fi feature 45

Summarization of Contradictory Opinions [Kim & Zhai CIKM 09] Facet 1: Movie Facet 2:

Summarization of Contradictory Opinions [Kim & Zhai CIKM 09] Facet 1: Movie Facet 2: Book Neutral Positive Negative . . . Ron Howards selection of Tom Hanks to play Robert Langdon. Tom Hanks stars in the movie, who can be mad at that? But the movie might get delayed, and even killed off if he loses. Directed by: Ron Howard Writing credits: Akiva Goldsman. . . Tom Hanks, who is my favorite movie star act the leading role. protesting. . . will lose your faith by. . . watching the movie. went online and some research on. . . it? such a big deal about a FICTION book and movie. I remembered when i first read the book, I finished the book in two days. Awesome book. . so sick of people making such a big deal about a FICTION book and movie. I’m reading “Da Vinci Code” now. So still a good book to past time. This controversy book cause lots conflict in west society. How can we help analysts digest and After watching the movie I contradictory Anybody is interestedopinioons? in. . . so sick of people making interpret … 46

Contrastive Opinion Summarization X Y x 1 y 1 x 2 y 2 x

Contrastive Opinion Summarization X Y x 1 y 1 x 2 y 2 x 3 y 3 x 4 y 4 … x 5 … ym xn 47

Contrastive Opinion Summarization X Y x 1 y 1 x 2 u 1 v

Contrastive Opinion Summarization X Y x 1 y 1 x 2 u 1 v 1 y 2 x 3 u 2 v 2 y 3 x 4 uk x 5 … … vk … ym … xn y 4 Contrastive Opinion Summary 48

Problem Formulation Representativeness X Y x 1 U V y 1 x 2 u

Problem Formulation Representativeness X Y x 1 U V y 1 x 2 u 1 v 1 y 2 x 3 u 2 v 2 y 3 x 4 uk x 5 … … vk … ym … xn y 4 Contrastiveness 49

Problem Formulation Representativeness X Y x 1 U V y 1 x 2 u

Problem Formulation Representativeness X Y x 1 U V y 1 x 2 u 1 v 1 y 2 x 3 u 2 v 2 y 3 x 4 uk x 5 … … vk … ym … xn y 4 Contrastiveness 50

Summarization as Optimization 1. Define an appropriate content similarity function Ф 2. Define an

Summarization as Optimization 1. Define an appropriate content similarity function Ф 2. Define an appropriate contrastive similarity function ψ 3. Solve the optimization problem efficiently. 51

Sample Results No Positive Negative 1 oh. . . and file transfers are fast

Sample Results No Positive Negative 1 oh. . . and file transfers are fast & easy. you need the software to actually transfer files 2 i noticed that the micro adjustment knob and collet are well made and work well too. the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob. 3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving difficult navigation - i wo n’t necessarily say " difficult , “ but i do n’t enjoy the scrollwheel to navigate. 4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level. there are 2 things that need fixing first is the battery life. it will run for 6 hrs without problems with medium usage of the buttons. 52

Sample Result No Positive Negative 1 oh. . . and file transfers are fast

Sample Result No Positive Negative 1 oh. . . and file transfers are fast & easy. you need the software to actually transfer files 2 i noticed that the micro adjustment knob and collet are well made and work well too. the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob. 3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving difficult navigation - i wo n’t necessarily say " difficult , “ but i do n’t enjoy the scrollwheel to navigate. i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level. there are 2 things that need fixing first is the battery life. it will run for 6 hrs without problems with medium usage of the buttons. Different polarities of opinions made from different perspectives. 4 53

Sample Result No Positive 1 oh. . . and file transfers are fast &

Sample Result No Positive 1 oh. . . and file transfers are fast & easy. 2 i noticed that the micro adjustment knob and collet are well made and work well too. Negative you need the software to actually transfer files Positive vs. negative the adjustment knob seemed ok, but when lowering the router, i have to Not much disagreement practically pull it down while turning the knob. 3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving difficult navigation - i wo n’t necessarily say " difficult , “ but i do n’t enjoy the scrollwheel to navigate. 4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level. there are 2 things that need fixing first is the battery life. it will run for 6 hrs without problems with medium usage of the buttons. 54

Sample Result No Positive Negative 1 oh. . . and file transfers are fast

Sample Result No Positive Negative 1 oh. . . and file transfers are fast & easy. you need the software to actually transfer files 2 i noticed that the micro adjustment knob and collet are well made and work well too. the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob. 3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving difficult navigation - i wo n’t necessarily say " difficult , “ but i do n’t enjoy the scrollwheel to navigate. 4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level. there are 2 things that need fixing first is the battery life. it will run for 6 hrs without problems with medium usage of the buttons. Judgments revealing detailed conditions 55

Latent Aspect Rating Analysis • Motivation: – Many online reviews with text content as

Latent Aspect Rating Analysis • Motivation: – Many online reviews with text content as well as numeral ratings – How can we decompose an overall rating into ratings on different aspects? – How can we infer a reviewer’s relative weights on different aspects (rating factors)? – More generally, how to analyze latent factors related to numerical values associated with text? • Solution – Latent Rating Regression Model 2009 © Cheng. Xiang Zhai 56

Problem 1. Different reviewers give the same overall ratings for different reasons Need for

Problem 1. Different reviewers give the same overall ratings for different reasons Need for analyzing opinions at fine grained level of topical aspects! How do we decompose overall ratings into aspect ratings? 57

Problem 2. Same rating means differently for different reviewers Needs for further analyzing aspect

Problem 2. Same rating means differently for different reviewers Needs for further analyzing aspect emphasis of each reviewer! How do we infer aspect weights the reviewers have put onto the ratings? 58

Our Solution Aspect Segmentation Reviews + overall ratings + Latent Rating Regression Aspect segments

Our Solution Aspect Segmentation Reviews + overall ratings + Latent Rating Regression Aspect segments Term weights Aspect Rating Aspect Weight location: 1 0. 0 amazing: 1 0. 9 1. 3 0. 2 walk: 1 0. 1 anywhere: 1 room: 1 nicely: 1 appointed: 1 comfortable: 1 nice: 1 accommodating: 1 smile: 1 friendliness: 1 attentiveness: 1 0. 3 0. 1 0. 7 0. 1 0. 9 0. 6 0. 8 0. 7 0. 8 0. 9 1. 8 0. 2 3. 8 0. 6 Boot-stripping method 59

Latent Rating Regression (LRR) • Bayesian regression Aspect rating Aspect weight Overall rating Joint

Latent Rating Regression (LRR) • Bayesian regression Aspect rating Aspect weight Overall rating Joint probability 60

Inference in LRR • Aspect rating – • Aspect weight – Maximum a posteriori

Inference in LRR • Aspect rating – • Aspect weight – Maximum a posteriori estimation prior likelihood 61

Qualitative evaluation 1 • Aspect-level Hotel Analysis – Hotels with the same overall rating

Qualitative evaluation 1 • Aspect-level Hotel Analysis – Hotels with the same overall rating but different aspect rating – A better understanding in the finer-granularity level 62

Qualitative evaluation-2 • Reviewer-level Hotel Analysis – Different reviewers’ ratings about the same hotel

Qualitative evaluation-2 • Reviewer-level Hotel Analysis – Different reviewers’ ratings about the same hotel – Detailed analysis of reviewer’s opinion 63

Quantitative Evaluation • Results • Analysis 64

Quantitative Evaluation • Results • Analysis 64

Application 1 • User Rating Behavior Analysis – Rating factor analysis – Aspect emphasis

Application 1 • User Rating Behavior Analysis – Rating factor analysis – Aspect emphasis analysis 65

Application 2 • Aspect-based Summarization 66

Application 2 • Aspect-based Summarization 66

Ongoing Work • Discovery of controversy and contrastive summarization • Information trustworthiness 67

Ongoing Work • Discovery of controversy and contrastive summarization • Information trustworthiness 67

The End Thank You! More information about our research can be found at http:

The End Thank You! More information about our research can be found at http: //timan. cs. uiuc. edu/ 68