Course Summary Cheng Xiang Cheng Zhai Department of
- Slides: 10
Course Summary Cheng. Xiang “Cheng” Zhai Department of Computer Science University of Illinois at Urbana-Champaign 1
Course Goal • Advanced (graduate-level) introduction to the field of information retrieval (IR), broadly including Text mining • Goal – Provide a systematic introduction to statistical language models and their applications in text retrieval and text analysis – Provide an opportunity for students to explore frontier topics via course projects (customized toward the interests of students) – Give students enough training for doing research in IR or applying advanced IR techniques to applications – Tangible outcome: research paper, open source code, and application system 2
Text data cover all kinds of topics Topics: People Events Products Services, … … Sources: Blogs Microblogs Forums Reviews , … 45 M reviews 65 M msgs/day 53 M blogs 1307 M posts 115 M users 10 M groups … 3
Humans as Subjective & Intelligent “Sensors” Real World Sense Weather Report Sensor Thermometer 3 C , 15 F, … Geo Sensor Locations 41°N and 120°W …. Network Sensor Networks Perceive Data 0100011100 Express “Human Sensor” 4
Unique Value of Text Data • Useful to all big data applications • Especially useful for mining knowledge about people’s behavior, attitude, and opinions • Directly express knowledge about our world: Small text data are also useful! Data Information Knowledge Text Data 5
Main Techniques for Harnessing Big Text Data: Text Retrieval + Text Mining This Course Text Mining & Analytics Course Text Retrieval Text Mining Big Data Big. Text Data Small Relevant Data Knowledge Many Applications
Main Techniques for Building a Text. Scope: Text Retrieval + Text Analysis Filtering Recommender Summarization Text Retrieval Text Analysis …… Big Data Big Text Data Categorization Clustering Search engines Topic mining Sentiment Prediction …… Relevant Data Security Education Business Social Media …… Small Relevant Small Data Medical/Health Knowledge Many Applications
This Course: Statistical Language Models Filtering Recommender Summarization Text Retrieval Text Analysis …… Big Data Big Text Data Categorization Clustering Search engines Topic mining Sentiment Prediction …… Relevant Data Security Education Business Social Media …… Small Relevant Small Data Medical/Health Knowledge Many Applications
What to Learn/Do Next? Applications Models Applications Web, Bioinformatics… Machine Learning Pattern Recognition Data Mining Statistics Optimization Computer Vision Natural Language Processing Algorithms r ute p om -C ction n ma era Hu Int Information Retrieval Library & Info Science Databases Software engineering Computer systems Systems 9
Thank TAs: Ayushi Patel, Sahiti Labhishetty You Piazza contributors & All of you! 10