CS 412 Intro to Data Mining Chapter 13

  • Slides: 19
Download presentation
CS 412 Intro. to Data Mining Chapter 13. Trends and Research Frontier in DM

CS 412 Intro. to Data Mining Chapter 13. Trends and Research Frontier in DM Qi Li, Computer Science, Univ. Illinois at Urbana-Champaign, 2019 1

Chapter 13. Trends and Research Frontier in DM 2 q Complex data types q

Chapter 13. Trends and Research Frontier in DM 2 q Complex data types q Applications q Data mining competitions q Ethical issues of data mining

Complex Data Types 3 q Sequences q Graphs q Text q Web q Stream,

Complex Data Types 3 q Sequences q Graphs q Text q Web q Stream, spatiotemporal, multimedia, Io. T…

Sequences Time Series Data (e. g. , stock market data) q Symbolic sequences (e.

Sequences Time Series Data (e. g. , stock market data) q Symbolic sequences (e. g. , customer shopping sequences, web clickstreams) q Biological sequences (e. g. , DNA sequences) q 4

Graph Homogeneous graph (nodes/links are of same type) q Heterogeneous (nodes/links are of different

Graph Homogeneous graph (nodes/links are of same type) q Heterogeneous (nodes/links are of different types) q 5

Text q 6 Text mining is an interdisciplinary field that draws on information retrieval,

Text q 6 Text mining is an interdisciplinary field that draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. A substantial portion of information is stored as text, such as news articles, technical papers, books, digital libraries, e-mail messages, blogs, and Web pages. Hence, research in text mining has been very active, an important goal of which is to derive high-quality information from text.

Web content q Web structure q Web usage q 7

Web content q Web structure q Web usage q 7

Challenges q 8

Challenges q 8

Chapter 13. Trends and Research Frontier in DM 9 q Complex data types q

Chapter 13. Trends and Research Frontier in DM 9 q Complex data types q Applications q Data mining competitions q Ethical issues of data mining

Recommender System q q q Product recommendation (Amazon, EBay) Search recommendation (Google, Bing) Video/music/post

Recommender System q q q Product recommendation (Amazon, EBay) Search recommendation (Google, Bing) Video/music/post recommendation (Netflix, Pandora, Pinterst) Friend recommendation (Facebook, twitter) Job recommendation (link. In) collaborative filtering q content-based filtering q hybrid q 10

Commerce, Profiling and Finance Planning and Forecasting q Dynamic pricing q Ads bidding q

Commerce, Profiling and Finance Planning and Forecasting q Dynamic pricing q Ads bidding q Profiling q User profiling q Churn Prediction: knowing which users are going to stop using your platform in the future. q Product profiling q Fintech q Stock market q Sentiment analysis q 11

Urban Planning Energy and power q Traffic prediction and management q Parking detection q

Urban Planning Energy and power q Traffic prediction and management q Parking detection q Traffic control q Transportation sharing system q Uber q Bike-sharing q Pollution q Air quality prediction q 12

Medicine and Healthcare Disease prediction q Computer Aided Detection q EHR q Risk prediction

Medicine and Healthcare Disease prediction q Computer Aided Detection q EHR q Risk prediction q Disease progression prediction q Healthcare q Epidemic and outbreak prediction q Food safety q Medicine study q Drug discovery and prediction q Bioinformatics q 13

Other Sciences and Applications Education q MOOC (massive open online course) q Political science

Other Sciences and Applications Education q MOOC (massive open online course) q Political science and Social science q Fake news q Crime and terrorist detection q Disaster detection q Opinion mining q Social influence q Environmental Science q Climate q 14

Chapter 13. Trends and Research Frontier in DM 15 q Complex data types q

Chapter 13. Trends and Research Frontier in DM 15 q Complex data types q Applications q Data mining competitions q Ethical issues of data mining

Data mining competitions KDD cup q https: //www. kdd. org/kdd 2019/kdd-cup q WSDM cup

Data mining competitions KDD cup q https: //www. kdd. org/kdd 2019/kdd-cup q WSDM cup q http: //www. wsdm-conference. org/2018/call-for-participants. html q ICDM Contest q http: //icdm 2019. bigke. org/ q 16

KDD Cup q q q q 17 KDD cup 2018: forecast air quality KDD

KDD Cup q q q q 17 KDD cup 2018: forecast air quality KDD Cup 2017: Highway Tollgates Traffic Flow Prediction KDD Cup 2016: Whose papers are accepted the most: towards measuring the impact of research institutions KDD Cup 2015: predicting students’ likelihood of dropout on MOOC KDD Cup 2014: Predict funding requests that deserve an A+ KDD Cup 2013 (Track 2): Identify which authors correspond to the same person KDD Cup 2013 (Track 1): Determine whether an author has written a given paper

Chapter 13. Trends and Research Frontier in DM 18 q Complex data types q

Chapter 13. Trends and Research Frontier in DM 18 q Complex data types q Applications q Data mining competitions q Ethical issues of data mining

Ethical Issues of Data Mining Privacy and safety q Information reliability q Information Bias

Ethical Issues of Data Mining Privacy and safety q Information reliability q Information Bias q Expandability q 19