Mainlining Data Mining Jim Gray Microsoft Panel talk

  • Slides: 7
Download presentation
Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE 2000 San Diego, 2

Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE 2000 San Diego, 2 Mar 2000

Is data mining still a niche technology? • 97, 363 items on Northern Light

Is data mining still a niche technology? • 97, 363 items on Northern Light re “data mining” • 9, 075, 288 items re “data base” or “database” • Is 100, 000 items a niche? (OR: 14 K, XML: 250 K) • Today data mining tools for experts (statisticians). (Decision Trees, Clusters, K-means, Neural nets…) • High tech and High Touch aka: consulting and license fees And the vendors like it that way. • Claim that you MUST understand the technology to use it.

But. . The Petabytes are Coming!! • We will be/are drowning in data/email/web. .

But. . The Petabytes are Coming!! • We will be/are drowning in data/email/web. . • Abstraction & categorization are key technologies • But, – They have to work. – They have to be trivial to learn. • Successful Ubiquitous data mining (clustering/classifiers…) – Mail Filters/Classifiers – Resume readers – Shopping recommendations, Community finders – Web search engines

Key technical/research issues for transition to the mainstream? PROCESS PROBLEMS: • • Getting data

Key technical/research issues for transition to the mainstream? PROCESS PROBLEMS: • • Getting data into tool is hell Scrubbing data is hell Then comes the easy part: mining Then comes the really hard part: visualization and understanding • Most of us: – Can’t understand neural nets (that’s bad). – Can’t understand statistics (that’s a fact).

Key technical/research issues for transition to the mainstream? Opportunities: It’s not just numbers •

Key technical/research issues for transition to the mainstream? Opportunities: It’s not just numbers • Text mining • Time series • Domain specific – – Web logs Protein patterns Spatial (e. g. geology, astronomy) Image

New opportunities for KDM? • Make data capture/scrub/import trivial • Provide intuitive manipulation interfaces

New opportunities for KDM? • Make data capture/scrub/import trivial • Provide intuitive manipulation interfaces • Provide simpler analysis concepts support/confidence concept precision/recall ranking pivot & rollup & cube • Provide interactive visual data explorer. • Case in point: I have yet to see a nice data cube visualizer. CH EV FO Y By Year RD 0 199 991 1 992 1 993 1 By Make & Year By Color & Year Sum RED WHITE BLUE By Make & Co By Color

Research challenges that will impact data mining? • Simpler analysis concepts • Visualization tools

Research challenges that will impact data mining? • Simpler analysis concepts • Visualization tools to navigate data • Better algorithms = Better answers