Mining Dynamics of Data Streams in MultiDimensional Space

  • Slides: 10
Download presentation
Mining Dynamics of Data Streams in Multi-Dimensional Space Jiawei Han Department of Computer Science

Mining Dynamics of Data Streams in Multi-Dimensional Space Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www. cs. uiuc. edu/~hanj Mining Dynamics of Data Streams

Challenges of Stream Data Mining n Mining query mode: continuous, ad-hoc, progressive? n Mining

Challenges of Stream Data Mining n Mining query mode: continuous, ad-hoc, progressive? n Mining mode: batched vs. interactive vs. lazy mining? n Time constraints: real-time? n What patterns to be mined? n n Finding patterns, anomaly, differences, …in multiple streams Mining dynamics (changes, trends and evolutions) of data streams n Multi-level/multi-dimensional processing and data mining n Most stream data are at pretty low-level or multi-dimensional in nature 11/24/2020 Mining Dynamics of Data Streams 2

Why Mining Dynamics of Data Streams in Multi-Dimensional Space? n n Dynamics (changes, trends

Why Mining Dynamics of Data Streams in Multi-Dimensional Space? n n Dynamics (changes, trends and evolutions) of data streams n Perhaps the most interesting thing in streams n Cannot just look at the current data? Save something! Multi-dimensional stream mining n n n 11/24/2020 Most real stream data are at low-level or multidimensional in nature How to examine dynamically at multi-dimensions? Finding dynamics: patterns and outliers in certain dimensional space Mining Dynamics of Data Streams 3

Stream Data Mining Tasks n Multi-dimensional (on-line) analysis of streams n Clustering data streams

Stream Data Mining Tasks n Multi-dimensional (on-line) analysis of streams n Clustering data streams n Classification of data streams n Mining frequent patterns in data streams n Mining sequential patterns in data streams n Mining partial periodicity in data streams n Mining notable gradients in data streams n Mining outliers and unusual patterns in data streams n ……, more? 11/24/2020 Mining Dynamics of Data Streams 4

Example 1: Multi-Dimensional (OLAP) Analysis n Analysis of Web click streams n n Raw

Example 1: Multi-Dimensional (OLAP) Analysis n Analysis of Web click streams n n Raw data at low levels: seconds, web page addresses, user IP addresses, … Analysts want: changes, trends, unusual patterns, at reasonable levels of details E. g. , Average clicking traffic in North America on sports in the last 15 minutes is 40% higher than that in the last 24 hours. ” Analysis of power consumption streams n n 11/24/2020 Raw data: power consumption flow for every household, every minute Patterns one may find: average hourly power consumption surges up 30% for manufacturing companies in Chicago in the last 2 hours today than that of the same day a week ago Mining Dynamics of Data Streams 5

Example 2: Multi-Dimensional Classification n Dynamic model update for loan or investment n Huge

Example 2: Multi-Dimensional Classification n Dynamic model update for loan or investment n Huge amount of incoming flow of changing information with multiple dimensional space (factors) n E. g. , Should we invest this company based on the situation of the current market? n Classification in dynamic (volatile) stock market n Classification of stocks based on their current streams n 11/24/2020 E. g. , Is Lucent going to be up in the next little while? Mining Dynamics of Data Streams 6

Example 3: Hi-Dimensional Clustering n Network intrusion detection n n Huge amount of incoming

Example 3: Hi-Dimensional Clustering n Network intrusion detection n n Huge amount of incoming flow of network traffic information, multiple dimensional features in nature n Find burst of activities/traffic in real time n On-line clustering to detect abrupt changes What are the changes of e-mail or text information 11/24/2020 n Clustering based on frequent terms n Can we perform such clustering in real-time? Mining Dynamics of Data Streams 7

Methodology in Stream Data Mining n n n Multi-dimensional (on-line) analysis Mining dynamics of

Methodology in Stream Data Mining n n n Multi-dimensional (on-line) analysis Mining dynamics of data streams Time is a special dimension n n Stream data reduction and pre-computation n n n Tilted time frame (multiple time granularity) What kind of multi-dimensional data to be pre-computed and stored for OLAP analysis? What kind of data to be pre-computed/stored for classification? For clustering? For mining frequent patterns? For mining sequential patterns? partial periodic patterns? …… How to do incremental updates? How to find changes? 11/24/2020 Mining Dynamics of Data Streams 8

? - Questions in Stream Data Mining n n n 11/24/2020 Will stream data

? - Questions in Stream Data Mining n n n 11/24/2020 Will stream data mining be real in practice? Should we develop general stream data mining principles, or ad-doc application-oriented methods? How are stream data mining methods different from incremental mining? How are stream data mining linked with stream data management system? With continuous query processing? Can we do privacy-preserving mining with stream data? Mining Dynamics of Data Streams 9

www. cs. uiuc. edu/~hanj Thank you !!! 11/24/2020 Mining Dynamics of Data Streams 10

www. cs. uiuc. edu/~hanj Thank you !!! 11/24/2020 Mining Dynamics of Data Streams 10