Research on Intelligent Text Information Management Cheng Xiang
- Slides: 67
Research on Intelligent Text Information Management Cheng. Xiang Zhai Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology Statistics University of Illinois, Urbana-Champaign http: //www. cs. uiuc. edu/homes/czhai, czhai@cs. uiuc. edu Contains joint work with Xuehua Shen, Bin Tan, Qiaozhu Mei, Yue Lu, Hongning, Vinod, and other members of the TIMan group 1
Research Roadmap Web, Email, and Bioinformatics Search Applications Summarization Filtering Information Access Search Mining Applications Mining Information Organization Categorization Current focus Visualization Extraction Knowledge Acquisition Clustering Natural Language Content Analysis Current focus - Personalized -Contextual text mining Text - Retrieval models -Opinion integration -Information quality - Topic map Entity/Relation Extraction - Recommender 2
Sample Projects • Optimization of Retrieval Models • User-Centered Adaptive Information Retrieval • Multi-Resolution Topic Map for Browsing • Contextual Text Mining • Opinion Integration and summarization • Information Trustworthiness 3
Project 1: Optimization of Retrieval Models • Content-based matching is a critical component in any search system • Developed a number of retrieval models to optimize content matching – Language models: various LMs supporting proximity, word translations, feedback, … – Axiomatic framework: theoretical analysis of retrieval models – Recently looking into optimal interactive retrieval and domain-specific retrieval models (feedback, exploitation-exploration tradeoff, medical case retrieval, forum retrieval, … ) 4
Project 2: User-Centered Adaptive IR (UCAIR) • A novel retrieval strategy emphasizing – user modeling (“user-centered”) – search context modeling (“adaptive”) – interactive retrieval • Implemented as a personalized search agent that – sits on the client-side (owned by the user) – integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) – collaborates with each other – goes beyond search toward task support 5
Non-Optimality of Document-Centered Search Engines Query = Jaguar As of Oct. 17, 2005 Car Software Mixed results, unlikely optimal for any particular user Car Animal Car 6
The UCAIR Project (NSF CAREER) WEB Viewed Web pages Search Engine Desktop Files . . . Email Query History Search Engine Personalized search agent “jaguar” 7
Potential Benefit of Personalization Suppose we know: Car 1. Previous query = “racing cars” vs. “Apple OS” Car Software Car 2. “car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days 3. User just viewed an “Apple OS” document Animal Car 8
Intelligent Re-ranking of Unseen Results When a user clicks on the “back” button after viewing a document, UCAIR reranks unseen results to pull up documents similar to the one the user has viewed 9
UCAIR Outperforms Google [Shen et al. 05] Precision at N documents Ranking Method prec@5 prec@10 prec@20 prec@30 Google UCAIR 0. 538 0. 581 0. 472 0. 556 0. 377 0. 453 0. 308 0. 375 17. 8% 20. 2% 21. 8% Improvement 8. 0% UCAIR toolbar available at http: //sifaka. cs. uiuc. edu/ir/ucair/ 10
Future: Personal Information Agent WWW Desktop Intranet Email IM User Profile Active Info Service Security Handler Blog E-COM Task Support … Personal Content Index Frequently Accessed Info Sports … Literature 11
Ongoing Work • UCAIR system • Recommendation and advertising on social networks 12
Project 3: Multi-Resolution Topic Map for Browsing • Promoting browsing as a “first-class citizen” • Multi-resolution topic map for browsing – Enable a user to find information through navigation – Very useful when a user can’t formulate effective queries or uses a small screen device • Search log as information footprints – Organize search log into a topic map – Allow a user to follow information footprints of previous users – Enable social surfing 13
Querying vs. Browsing 14
Information Seeking as Sightseeing • Know the address of an attraction site? – Yes: take a taxi and go directly to the site – No: walk around or take a taxi to a nearby place then walk around • Know what exactly you want to find? – Yes: use the right keywords as a query and find the information directly – No: browse the information space or start with a rough query and then browse When query fails, browsing comes to rescue… 15
Current Support for Browsing is Limited • Hyperlinks – Only page-to-page Beyond hyperlinks? – Mostly manually constructed – Browsing step is very small • Web directories – Manually constructed ODP Beyond fixed – Only support vertical navigation categories? – Fixed categories How to promote browsing as a “first-class citizen”? 16
Sightseeing Analogy Continues… Horizontal navigation Region Zoom in Zoom out 17
Topic Map for Touring Information Space Zoom in Multiple resolutions Topic regions 0. 03 0. 02 0. 05 0. 03 0. 01 Zoom out Horizontal navigation 18
Topic-Map based Browsing Demo 19
How can we construct such a multi -resolution topic map? Multiple possibilities… 20
Search Logs as Information Footprints in information space User 2722 searched for "national car rental" [!] at 2006 -03 -09 11: 24: 29 User 2722 searched for "military car rental benefits" [!] at 2006 -03 -10 09: 33: 37 (found http: //www. valoans. com) User 2722 searched for "military car rental benefits" [!] at 2006 -03 -10 09: 33: 37 (found http: //benefits. military. com) User 2722 searched for "military car rental benefits" [!] at 2006 -03 -10 09: 33: 37 (found http: //www. avis. com) User 2722 searched for "enterprise rent a car" [!] at 2006 -0405 23: 37: 42 (found http: //www. enterprise. com) User 2722 searched for "meineke care center" [!] at 200605 -02 09: 12: 49 (found http: //www. meineke. com) User 2722 searched for "car rental" [!] at 2006 -05 -25 15: 54: 36 User 2722 searched for "autosave car rental" [!] at 2006 -0525 23: 26: 54 (found http: //eautosave. com) User 2722 searched for "budget car rental" [!] at 2006 -05 -25 23: 29: 53 User 2722 searched for "alamo car rental" [!] at 2006 -05 -25 23: 56: 13 …… 21
Information Footprints Topic Map • Challenges – How to define/construct a topic region – How to control granularities/resolutions of topic regions – How to connect topic regions to support effective browsing • Two approaches – Multi-granularity clustering – Query editing 22
Collaborative Surfing Navigation trace enriches map structures New queries become new footprints Clickthroughs become new footprints Browse logs offer more opportunities to understand user interests and intents 23
Project 4: Contextual Text Mining • Documents are often associated with context (metadata) – Direct context: time, location, source, authors, … – Indirect context: events, policies, … • Many applications require “contextual text analysis”: – Discovering topics from text in a context-sensitive way – Analyzing variations of topics over different contexts – Revealing interesting patterns (e. g. , topic evolution, topic variations, topic communities) 24
Example 1: Comparing News Articles Vietnam War CNN Afghan War Fox Before 9/11 During Iraq war US blog European blog Iraq War Blog Current Others Common Themes “Vietnam” specific “Afghan” specific “Iraq” specific United nations … … … Death of people … … … … What’s in common? What’s unique? 25
More Contextual Analysis Questions • What positive/negative aspects did people say about X (e. g. , a person, an event)? Trends? • How does an opinion/topic evolve over time? • What are emerging topics? What topics are fading away? • How can we characterize a social network? 26
Research Questions • Can we model all these problems generally? • Can we solve these problems with a unified approach? • How can we bring human into the loop? 27
Contextual Probabilistic Latent Semantics Analysis ([KDD 2006]…) Themes View 1 View 2 View 3 government Choose a theme Criticism of government Draw a word from i response togovernment the hurricane government 0. 3 primarily consisted of response 0. 2. . Document response criticism of its response context: to … The total shut-in oil Time = from July the 2005 production Gulf Location = Texas of Mexico …donate approximately = xxx 24% of. Author the annual help aid production the shut. Occup. = and Sociologist in gas Over Ageproduction Group = … 45+ seventy countries pledged …Orleans monetary donations or new other assistance. … donate 0. 1 relief 0. 05 help 0. 02. . donation city 0. 2 new 0. 1 orleans 0. 05. . New Orleans Texas July 2005 sociolo gist Choose a view Theme coverages : …… Texas July 2005 document Choose a Coverage 28
Comparing News Articles Iraq War (30 articles) vs. Afghan War (26 articles) The common theme indicates that “United Nations” is involved in both wars Cluster 1 Common Theme Iraq Theme Afghan Theme united nations … Cluster 2 0. 04 n 0. 03 Weapons 0. 024 Inspections 0. 023 … Northern 0. 04 alliance 0. 04 kabul 0. 03 taleban 0. 025 aid 0. 02 … killed month deaths … troops hoon sanches … taleban rumsfeld hotel front … Cluster 3 0. 035 0. 032 0. 023 … 0. 016 0. 015 0. 012 … 0. 026 0. 02 0. 011 … Collection-specific themes indicate different roles of “United Nations” in the two wars 29
Spatiotemporal Patterns in Blog Articles • • • Query= “Hurricane Katrina” Topics in the results: Spatiotemporal patterns 30
Theme Life Cycles (“Hurricane Katrina”) Oil Price New Orleans price 0. 0772 oil 0. 0643 gas 0. 0454 increase 0. 0210 product 0. 0203 fuel 0. 0188 company 0. 0182 … city 0. 0634 orleans 0. 0541 new 0. 0342 louisiana 0. 0235 flood 0. 0227 evacuate 0. 0211 storm 0. 0177 … 31
Theme Snapshots (“Hurricane Katrina”) Week 2: The discussion moves towards the north and west Week 1: The theme is the strongest along the Gulf of Mexico Week 3: The theme distributes more uniformly over the states Week 4: The theme is again strong along the east coast and the Gulf of Mexico Week 5: The theme fades out in most states 32
Theme Life Cycles (KDD Papers) gene 0. 0173 expressions 0. 0096 probability 0. 0081 microarray 0. 0038 … marketing 0. 0087 customer 0. 0086 model 0. 0079 business 0. 0048 … rules 0. 0142 association 0. 0064 support 0. 0053 … 33
Theme Evolution Graph: KDD 1999 2000 2001 2002 SVM 0. 007 criteria 0. 007 classifica – tion 0. 006 linear 0. 005 … decision 0. 006 tree 0. 006 classifier 0. 005 class 0. 005 Bayes 0. 005 … web 0. 009 classifica – tion 0. 007 features 0. 006 topic 0. 005 … 2003 mixture 0. 005 random 0. 006 clustering 0. 005 variables 0. 005 … … Classifica - tion text unlabeled document labeled learning … 0. 015 0. 013 0. 012 0. 008 0. 007 … Informa - tion 0. 012 web 0. 010 social 0. 008 retrieval 0. 007 distance 0. 005 networks 0. 004 … 2004 T topic 0. 010 mixture 0. 008 LDA 0. 006 semantic 0. 005 … 34
Multi-Faceted Sentiment Summary (query=“Da Vinci Code”) Facet 1: Movie Facet 2: Book Neutral Positive Negative . . . Ron Howards selection of Tom Hanks to play Robert Langdon. Tom Hanks stars in the movie, who can be mad at that? But the movie might get delayed, and even killed off if he loses. Directed by: Ron Howard Writing credits: Akiva Goldsman. . . Tom Hanks, who is my favorite movie star act the leading role. protesting. . . will lose your faith by. . . watching the movie. After watching the movie I went online and some research on. . . Anybody is interested in it? . . . so sick of people making such a big deal about a FICTION book and movie. I remembered when i first read the book, I finished the book in two days. Awesome book. . so sick of people making such a big deal about a FICTION book and movie. I’m reading “Da Vinci Code” now. So still a good book to past time. This controversy book cause lots conflict in west society. … 35
Separate Theme Sentiment Dynamics “book” “religious beliefs” 36
Event Impact Analysis: IR Research Theme: retrieval models term 0. 1599 relevance 0. 0752 weight 0. 0660 feedback 0. 0372 independence 0. 0311 model 0. 0310 frequent 0. 0233 probabilistic 0. 0188 document 0. 0173 … vector concept extend model space boolean function feedback … xml email model collect judgment rank subtopic … 0. 0514 0. 0298 0. 0297 0. 0291 0. 0236 0. 0151 0. 0123 0. 0077 1992 0. 0678 0. 0197 0. 0191 0. 0187 0. 0102 0. 0097 0. 0079 SIGIR papers Publication of the paper “A language modeling approach to information retrieval” Starting of the TREC conferences probabilist 0. 0778 model 0. 0432 logic 0. 0404 ir 0. 0338 boolean 0. 0281 algebra 0. 0200 estimate 0. 0119 weight 0. 0111 … year 1998 model 0. 1687 language 0. 0753 estimate 0. 0520 parameter 0. 0281 distribution 0. 0268 probable 0. 0205 smooth 0. 0198 markov 0. 0137 likelihood 0. 0059 … 37
Topic Modeling + Social Networks Authors writing about the same topic form a community Separation of 3 research communities: IR, ML, Web Topic Model Only Topic Model + Social Network 38
Next Step in Contextual Text Mining • Combining contextual text analysis with visualization • More detailed semantic modeling (entities, relations, …) • Integration of search and contextual text analysis to develop an analyst’s workbench: – Interactive semantic navigation and probing – Synthesis of information/knowledge – Personalized/customized service 39
Project 5: Opinion Integration and Summarization • Increasing popularity of Web 2. 0 applications – more people express opinions on the Web How to digest all? 190, 451 posts 4, 773, 658 results 40
Motivation: Two kinds of opinions 190, 451 posts 4, 773, 658 results How to benefit from both? Expert opinions • CNET editor’s review • Wikipedia article • Well-structured • Easy to access • Maybe biased • Outdated soon Ordinary opinions • Forum discussions • Blog articles • Represent the majority • Up to date • Hard to access • fragmental 41
Problem Definition Input Topic: i. Pod Expert review with aspects Text collection of ordinary opinions, e. g. Weblogs Design Battery Price. . Extra Aspects Review Aspects Output Design Battery Price i. Tunes warranty Similar Supplementary opinions cute… tiny…. . thicker. . last many hrs die out soon could afford still it expensive … easy to use… …better to extend. . Integrated Summary 42
Methods • Semi-Supervised Probabilistic Latent Semantic Analysis (PLSA) – The aspects extracted from expert reviews serve as clues to define a conjugate prior on topics – Maximum a Posteriori (MAP) estimation – Repeated applications of PLSA to integrate and align opinions in blog articles to expert review
Results: Product (i. Phone) • Opinion Integration with review aspects Review article Similar opinions You can make N/A emergency calls, but you can't use any other functions… Confirm the Activation opinions from the review will Feature rated battery life of 8 i. Phone hours talk time, 24 Up to 8 Hours of Talk hours of music Time, 6 Hours of playback, 7 hours of Internet Use, 7 Hours video playback, and 6 of Video Playback or hours on Internet use. 24 Hours of Audio Playback Battery Supplementary opinions … methods for unlocking the i. Phone have emerged on the Unlock/hack Internet in the past few weeks, i. Phone they involve tinkering although with the i. Phone hardware… Playing relatively high bitrate VGA H. 264 videos, our i. Phone lasted almost exactly 9 freaking hours of continuous playback with cell and Wi. Fi on (but Bluetooth off). Additional info under real usage 44
Results: Product (i. Phone) • Opinions on extra aspects support Supplementary opinions on extra aspects 15 You may have heard of i. ASign … an i. Phone Dev Wiki tool that Another way to allows you to activate your phone without going through the activate i. Phone i. Tunes rigamarole. 13 Cisco has owned the trademark on the name "i. Phone" since 2000, when it acquired Info. Geari. Phone Technology Corp. , which trademark originally registered the name. originally owned by 13 Cisco With the imminent availability of Apple's uber cool i. Phone, a look at 10 things current smartphones like the Nokia N 95 have choiceand for that the i. Phone can't currently been able to. Adobetter for a while smart phones? match. . . 45
Results: Product (i. Phone) • Support statistics for review aspects People care about price Controversy: activation requires contract with AT&T People comment a lot about the unique wi-fi feature 46
Summarization of Contradictory Opinions [Kim & Zhai CIKM 09] Facet 1: Movie Facet 2: Book Neutral Positive Negative . . . Ron Howards selection of Tom Hanks to play Robert Langdon. Tom Hanks stars in the movie, who can be mad at that? But the movie might get delayed, and even killed off if he loses. Directed by: Ron Howard Writing credits: Akiva Goldsman. . . Tom Hanks, who is my favorite movie star act the leading role. protesting. . . will lose your faith by. . . watching the movie. went online and some research on. . . it? such a big deal about a FICTION book and movie. I remembered when i first read the book, I finished the book in two days. Awesome book. . so sick of people making such a big deal about a FICTION book and movie. I’m reading “Da Vinci Code” now. So still a good book to past time. This controversy book cause lots conflict in west society. How can we help analysts digest and After watching the movie I contradictory Anybody is interestedopinioons? in. . . so sick of people making interpret … 47
Contrastive Opinion Summarization X Y x 1 y 1 x 2 y 2 x 3 y 3 x 4 y 4 … x 5 … ym xn 48
Contrastive Opinion Summarization X Y x 1 y 1 x 2 u 1 v 1 y 2 x 3 u 2 v 2 y 3 x 4 uk x 5 … … vk … ym … xn y 4 Contrastive Opinion Summary 49
Problem Formulation Representativeness X Y x 1 U V y 1 x 2 u 1 v 1 y 2 x 3 u 2 v 2 y 3 x 4 uk x 5 … … vk … ym … xn y 4 Contrastiveness 50
Problem Formulation Representativeness X Y x 1 U V y 1 x 2 u 1 v 1 y 2 x 3 u 2 v 2 y 3 x 4 uk x 5 … … vk … ym … xn y 4 Contrastiveness 51
Summarization as Optimization 1. Define an appropriate content similarity function Ф 2. Define an appropriate contrastive similarity function ψ 3. Solve the optimization problem efficiently. 52
Sample Results No Positive Negative 1 oh. . . and file transfers are fast & easy. you need the software to actually transfer files 2 i noticed that the micro adjustment knob and collet are well made and work well too. the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob. 3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving difficult navigation - i wo n’t necessarily say " difficult , “ but i do n’t enjoy the scrollwheel to navigate. 4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level. there are 2 things that need fixing first is the battery life. it will run for 6 hrs without problems with medium usage of the buttons. 53
Sample Result No Positive Negative 1 oh. . . and file transfers are fast & easy. you need the software to actually transfer files 2 i noticed that the micro adjustment knob and collet are well made and work well too. the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob. 3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving difficult navigation - i wo n’t necessarily say " difficult , “ but i do n’t enjoy the scrollwheel to navigate. i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level. there are 2 things that need fixing first is the battery life. it will run for 6 hrs without problems with medium usage of the buttons. Different polarities of opinions made from different perspectives. 4 54
Sample Result No Positive 1 oh. . . and file transfers are fast & easy. 2 i noticed that the micro adjustment knob and collet are well made and work well too. Negative you need the software to actually transfer files Positive vs. negative the adjustment knob seemed ok, but when lowering the router, i have to Not much disagreement practically pull it down while turning the knob. 3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving difficult navigation - i wo n’t necessarily say " difficult , “ but i do n’t enjoy the scrollwheel to navigate. 4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level. there are 2 things that need fixing first is the battery life. it will run for 6 hrs without problems with medium usage of the buttons. 55
Sample Result No Positive Negative 1 oh. . . and file transfers are fast & easy. you need the software to actually transfer files 2 i noticed that the micro adjustment knob and collet are well made and work well too. the adjustment knob seemed ok, but when lowering the router, i have to practically pull it down while turning the knob. 3 the navigation is nice enough , but scrolling and searching through thousands of tracks , hundreds of albums or artists , or even dozens of genres is not conducive to save driving difficult navigation - i wo n’t necessarily say " difficult , “ but i do n’t enjoy the scrollwheel to navigate. 4 i imagine if i left my player untouched (no backlight) it could play for considerably more than 12 hours at a low volume level. there are 2 things that need fixing first is the battery life. it will run for 6 hrs without problems with medium usage of the buttons. Judgments revealing detailed conditions 56
Open Opinion Search System http: //timan 1. cs. uiuc. edu/cgi-bin/hkim 277/COSDemo/lemur. cgi 57
Latent Aspect Rating Analysis How to infer aspect ratings? How to infer aspect weights? Value Location Service …. . Value Location Service
Solution: Latent Rating Regression Model Aspect Segmentation Reviews + overall ratings + Latent Rating Regression Aspect segments Term weights Aspect Rating Aspect Weight location: 1 0. 0 amazing: 1 0. 9 walk: 1 1. 3 0. 2 0. 1 anywhere: 1 room: 1 nicely: 1 appointed: 1 comfortable: 1 nice: 1 accommodating: 1 smile: 1 friendliness: 1 attentiveness: 1 0. 3 0. 1 0. 7 0. 1 0. 9 0. 6 0. 8 0. 7 0. 8 0. 9 1. 8 0. 2 3. 8 0. 6 Topic model for aspect discovery 59
Aspect-Based Opinion Summarization
Reviewer Behavior Analysis & Personalized Ranking of Entities People like expensive hotels because of good service Query: 0. 9 value 0. 1 others Non-Personalized People like cheap hotels because of good value
Project 6: Information Trustworthiness • How to assess information quality? • Solution: trust propagation framework 62
Trust Propagation Framework Evidence Source Claim 63
Sample Result: Trusted Sources on Different Topics 64
Trusted Sources on Different Genres 65
Toward Next-Generation Search Engines Task Support Full-Fledged Mining Text Info. Management Access Search Current Search Engine Keyword Queries Search History Personalization Complete User Model (User Modeling) Bag of words Entities-Relations Large-Scale Semantic Analysis Knowledge Representation (Vertical Search Engine 66
The End Thank You! More information about our research can be found at http: //timan. cs. uiuc. edu/ 67
- E e e qu xiang xiang tian ge
- Chengxiang zhai
- Cheng xiang zhai
- Cheng xiang zhai
- Cheng xiang zhai
- Cheng zhai
- Morrp
- Cheng xiang zhai
- Morrp
- Text-to-media connection
- When was the planners written
- Vex xiang
- Liu xiang weightlifter
- Liu xiang
- Xiang yu liu bang
- Liu xiang
- Xiang yang liu
- Xiang su
- Record producer job description
- Pu tao you
- Perfume xia xiang
- Vex xiang
- Yongqing xiang
- Intelligent information network
- Intelligent workload management
- Intelligent platform management interface industry
- Fio
- Advantages of data analysis in preclinical development are
- Disappeared by boey kim cheng
- John cheng md
- Peter chen entity relationship model
- Cheng field and wave electromagnetics
- Antony cheng
- King cheng
- Nitra wheels
- Cheng
- Ck cheng ucsd
- Wei cheng lee
- Mktg 303
- Myelopathy wayne
- Yizong cheng
- Cheng ruize
- Lou cheng
- Boey kim cheng
- Tai chi kung fu panda
- Chung-kuan cheng
- Chia liang cheng
- Sophia cheng accident
- Cheng-few lee
- Kiddonet games
- Judy cheng hopkins
- Vibrational mechanics
- Ismael herrera
- Thanh phách
- Pauline cheng
- Laminectomy wayne
- Sega cheng
- Any time interrogation call flow
- Defining intelligence
- Vni2140
- Intelligent sharing of power is done by
- Intelligent customer routing
- Contoh agen kecerdasan buatan
- Intelligent techniques adalah
- Decision support systems and intelligent systems
- 236501
- Ace web application firewall
- Storage array intelligence