CMU SCS Thank you C Faloutsos CMU CMU

  • Slides: 79
Download presentation
CMU SCS Thank you! C. Faloutsos CMU

CMU SCS Thank you! C. Faloutsos CMU

CMU SCS Large Graph Mining C. Faloutsos CMU

CMU SCS Large Graph Mining C. Faloutsos CMU

CMU SCS Large Graph Mining Data Mining for fun (and profit) C. Faloutsos CMU

CMU SCS Large Graph Mining Data Mining for fun (and profit) C. Faloutsos CMU

CMU SCS Outline • Credit where credit is due • Technical part – Data

CMU SCS Outline • Credit where credit is due • Technical part – Data mining – Can it be automated? – Research challenges • Non-technical part: `Listen’ – To the data – To non-experts KDD'10 C. Faloutsos 4

CMU SCS Nominator • Jian Pei KDD'10 C. Faloutsos 5

CMU SCS Nominator • Jian Pei KDD'10 C. Faloutsos 5

CMU SCS Endorsers • • • Charu C. Aggarwal (IBM Research) Ricardo Baeza-Yates (Yahoo!

CMU SCS Endorsers • • • Charu C. Aggarwal (IBM Research) Ricardo Baeza-Yates (Yahoo! Research) Albert-Laszlo Barabasi (Northeastern University) Denilson Barbosa (University of Alberta) Yixin Chen (Washington University at St. Louis) KDD'10 C. Faloutsos 6

CMU SCS Endorsers, cont’d • • • William Cohen (Carnegie Mellon University) Diane J.

CMU SCS Endorsers, cont’d • • • William Cohen (Carnegie Mellon University) Diane J. Cook (Washington State University) Gautam Das (University of Texas at Arlington) Inderjit S. Dhillon (University of Texas at Austin) Chris H. Q. Ding (University of Texas at Arlington) KDD'10 C. Faloutsos 7

CMU SCS Endorsers, cont’d • Petros Drineas (Rensselaer Polytechnic Institute) • Tina Eliassi-Rad (Lawrence

CMU SCS Endorsers, cont’d • Petros Drineas (Rensselaer Polytechnic Institute) • Tina Eliassi-Rad (Lawrence Livermore National Laboratory) • Greg Ganger (Carnegie Mellon University) • Minos Garofalakis (Technical University of Crete) • James Garrett (Carnegie Mellon University) KDD'10 C. Faloutsos 8

CMU SCS Endorsers, cont’d • • • Dimitrios Gunopulos (University of Athens) Xiaofei He

CMU SCS Endorsers, cont’d • • • Dimitrios Gunopulos (University of Athens) Xiaofei He (Zhejiang University) Panagiotis G. Ipeirotis (New York University) Eamonn Keogh (UCR) Hiroyuki Kitagawa (University of Tsukuba) Tamara Kolda (Sandia Nat. Labs) KDD'10 C. Faloutsos 9

CMU SCS Endorsers, cont’d • • • Flip Korn (AT&T Research) Nick Koudas (University

CMU SCS Endorsers, cont’d • • • Flip Korn (AT&T Research) Nick Koudas (University of Toronto) Hans-Peter Kriegel Ravi Kumar (Yahoo! Research) Lakshmanan (UBC) Jure Leskovec (Stanford University) KDD'10 C. Faloutsos 10

CMU SCS Endorsers, cont’d • • • Nikos Mamoulis (Hong Kong University) Heikki Manilla

CMU SCS Endorsers, cont’d • • • Nikos Mamoulis (Hong Kong University) Heikki Manilla (Aalto University, Dharmendra S. Modha (IBM Research) Mario Nascimento (University of Alberta) Jennifer Neville (Purdue University) Beng Chin Ooi (National University of Singapore) KDD'10 C. Faloutsos 11

CMU SCS Endorsers, cont’d • Dimitris Papadias (Hong Kong University of Science and Technology)

CMU SCS Endorsers, cont’d • Dimitris Papadias (Hong Kong University of Science and Technology) • Spiros Papadimitriou (IBM Research) • Jian Pei (Simon Fraser University) • Foster Provost (New York University) • Oliver Schulte (Simon Fraser University) • Dennis Shasha (New York University) • Srinivasan Parthasarathy (OSU) KDD'10 C. Faloutsos 12

CMU SCS Endorsers, cont’d • • • Jimeng Sun (IBM Research) Dacheng Tao (Nanyang

CMU SCS Endorsers, cont’d • • • Jimeng Sun (IBM Research) Dacheng Tao (Nanyang University of Technology) Yufei Tao (The Chinese University of Hong Kong) Evimaria Terzi (Boston University) Alex Thomo (University of Victoria) Andrew Tomkins (Google Research) KDD'10 C. Faloutsos 13

CMU SCS Endorsers, cont’d • • Caetano Traina (University of Sao Paulo) Vassilis Tsotras

CMU SCS Endorsers, cont’d • • Caetano Traina (University of Sao Paulo) Vassilis Tsotras (University of California, Riverside) Alex Tuzhilin (New York University) Haixun Wang (Microsoft Research) KDD'10 C. Faloutsos 14

CMU SCS Endorsers, cont’d • Wei Wang (University of North Carolina at Chapel Hill)

CMU SCS Endorsers, cont’d • Wei Wang (University of North Carolina at Chapel Hill) • Philip S. Yu (University of Illinois, Chicago) • Zhongfei Zhang (Binghamton University, State University of New York) KDD'10 C. Faloutsos 15

CMU SCS KDD committee • Ramasamy Uthurusamy, Chair • Robert Grossman (University of Illinois

CMU SCS KDD committee • Ramasamy Uthurusamy, Chair • Robert Grossman (University of Illinois at Chicago) • Jiawei Han (University of Illinois at Urbana -Champaign) • Tom Mitchell (Carnegie Mellon University) • Gregory Piatetsky-Shapiro (KDnuggets) KDD'10 C. Faloutsos 16

CMU SCS KDD committee cnt’d • Raghu Ramakrishnan (Yahoo! Research) • Sunita Sarawagi (Indian

CMU SCS KDD committee cnt’d • Raghu Ramakrishnan (Yahoo! Research) • Sunita Sarawagi (Indian Institute of Technology, Bombay) • Padhraic Smyth (University of California at Irvine) • Ramakrishnan Srikant (Google Research) KDD'10 C. Faloutsos 17

CMU SCS KDD committee cnt’d • Xindong Wu (University of Vermont) • Mohammed J.

CMU SCS KDD committee cnt’d • Xindong Wu (University of Vermont) • Mohammed J. Zaki (Rensselaer Polytechnic Institute) KDD'10 C. Faloutsos 18

CMU SCS Family • Parents Nikos & Sophia • Siblings Michalis*, Petros*, Maria •

CMU SCS Family • Parents Nikos & Sophia • Siblings Michalis*, Petros*, Maria • Wife Christina# : and co-authors (#) : and research impact evaluator (‘grandpa’ test - see later…) (*) KDD'10 C. Faloutsos 19

CMU SCS Academic ‘parents’ • Christodoulakis, Stavros (T. U. C. ) • Sevcik, Ken

CMU SCS Academic ‘parents’ • Christodoulakis, Stavros (T. U. C. ) • Sevcik, Ken (U of T) • Roussopoulos, Nick (UMD) KDD'10 C. Faloutsos 20

CMU SCS Academic ‘children’ • • • KDD'10 King-Ip (David) Lin Ibrahim Kamel Flip

CMU SCS Academic ‘children’ • • • KDD'10 King-Ip (David) Lin Ibrahim Kamel Flip Korn Byoung-Kee Yi Leejay Wu Deepayan Chakrabarti C. Faloutsos 21

CMU SCS Academic ‘children’ • • • KDD'10 Jia-Yu (Tim) Pan Spiros Papadimitriou Jimeng

CMU SCS Academic ‘children’ • • • KDD'10 Jia-Yu (Tim) Pan Spiros Papadimitriou Jimeng Sun Jure Leskovec Hanghang Tong C. Faloutsos 22

CMU SCS Academic ‘children’ • • KDD'10 Mary Mc. Glohon Fan Guo Lei Li

CMU SCS Academic ‘children’ • • KDD'10 Mary Mc. Glohon Fan Guo Lei Li Leman Akoglu Dueng Horng (Polo) Chau Aditya Prakash U Kang C. Faloutsos 23

CMU SCS CMU colleagues • • Tom Mitchell Garth Gibson Greg Ganger M. (Satya)

CMU SCS CMU colleagues • • Tom Mitchell Garth Gibson Greg Ganger M. (Satya) Satyanarayanan Howard Wactlar Jeannette Wing ++ KDD'10 C. Faloutsos 24

CMU SCS Co-authors • [dblp 7/2010: ] All 300 of you • Agma J.

CMU SCS Co-authors • [dblp 7/2010: ] All 300 of you • Agma J. M. Traina (22) • Caetano Traina Jr. (20) • … KDD'10 C. Faloutsos 25

CMU SCS Funding agencies • NSF (Maria Zemankova, Frank Olken, ++) • DARPA, LLNL,

CMU SCS Funding agencies • NSF (Maria Zemankova, Frank Olken, ++) • DARPA, LLNL, PITA • IBM, MS, HP, INTEL, Y!, Google, Symantec, Sony, Fujitsu, … KDD'10 C. Faloutsos 26

CMU SCS Outline • Credit where credit is due • Technical part – Data

CMU SCS Outline • Credit where credit is due • Technical part – Data mining – Can it be automated? – Research challenges • Non-technical part: `Listen’ – To the data – To non-experts KDD'10 C. Faloutsos 27

CMU SCS Data mining = compression & … Christos Faloutsos, Vasileios Megalooikonomou: On data

CMU SCS Data mining = compression & … Christos Faloutsos, Vasileios Megalooikonomou: On data mining, compression, and Kolmogorov complexity. Data Min. KDD'10 C. Faloutsos 28 Knowl. Discov. 15(1): 3 -20 (2007)

CMU SCS Data mining = compression & … Christos Faloutsos, Vasileios Megalooikonomou: On data

CMU SCS Data mining = compression & … Christos Faloutsos, Vasileios Megalooikonomou: On data mining, compression, and Kolmogorov complexity. Data Min. KDD'10 C. Faloutsos 29 Knowl. Discov. 15(1): 3 -20 (2007)

CMU SCS Data mining = compression & … But: how can compression • do

CMU SCS Data mining = compression & … But: how can compression • do forecasting? • spot outliers? • do classification? KDD'10 C. Faloutsos 30

CMU SCS Data mining = compression & … OK – then, isn’t compression a

CMU SCS Data mining = compression & … OK – then, isn’t compression a solved problem (gzip, LZ)? KDD'10 C. Faloutsos 31

CMU SCS … compression is undecidable! Theorem*: for an arbitrary string x, computing its

CMU SCS … compression is undecidable! Theorem*: for an arbitrary string x, computing its Kolmogorov complexity K(x) is undecidable EVEN WORSE than NP-hard! A. N. Kolmogorov (*) E. g. , [T. M. Cover and J. A. Thomas. Elements of KDD'10 C. Faloutsos 32 Information Theory. John Wiley and Sons, 1991, section 7. 7]

CMU SCS … compression is undecidable! …which means there will always be better data

CMU SCS … compression is undecidable! …which means there will always be better data mining tools/models/patterns to be discovered -> job security -> job satisfaction KDD'10 C. Faloutsos 33

CMU SCS Let’s see some examples of Response models to new drug KDD'10 C.

CMU SCS Let’s see some examples of Response models to new drug KDD'10 C. Faloutsos Body weight 34

CMU SCS Let’s see some examples of models $ spent KDD'10 C. Faloutsos income

CMU SCS Let’s see some examples of models $ spent KDD'10 C. Faloutsos income 35

CMU SCS Let’s see some examples of models $ spent KDD'10 C. Faloutsos income

CMU SCS Let’s see some examples of models $ spent KDD'10 C. Faloutsos income 36

CMU SCS Let’s see some examples of models $ spent KDD'10 C. Faloutsos income

CMU SCS Let’s see some examples of models $ spent KDD'10 C. Faloutsos income 37

CMU SCS Let’s see some examples of models $ spent 3/4 KDD'10 C. Faloutsos

CMU SCS Let’s see some examples of models $ spent 3/4 KDD'10 C. Faloutsos income 38

CMU SCS Let’s see some examples of Metabolic models rate 3/4 mass KDD'10 C.

CMU SCS Let’s see some examples of Metabolic models rate 3/4 mass KDD'10 C. Faloutsos http: //universe-review. ca /R 10 -35 -metabolic. htm 39

CMU SCS Metabolic rate Kleiberg’s law 3/4 mass KDD'10 C. Faloutsos http: //universe-review. ca

CMU SCS Metabolic rate Kleiberg’s law 3/4 mass KDD'10 C. Faloutsos http: //universe-review. ca /R 10 -35 -metabolic. htm 40

CMU SCS Outline • Credit where credit is due • Technical part – Data

CMU SCS Outline • Credit where credit is due • Technical part – Data mining – Can it be automated? NO! • Always room for better models – Research challenges • Non-technical part: `Listen’ – To the data – To non-experts KDD'10 C. Faloutsos 41

CMU SCS Always room for better models • Eg. : clustering – k-means (or

CMU SCS Always room for better models • Eg. : clustering – k-means (or our favorite clustering algo) • How many clusters are in the Sierpinski triangle? … KDD'10 C. Faloutsos 42

CMU SCS Always room for better models KDD'10 C. Faloutsos 43

CMU SCS Always room for better models KDD'10 C. Faloutsos 43

CMU SCS Always room for better models K=3 clusters? KDD'10 C. Faloutsos 44

CMU SCS Always room for better models K=3 clusters? KDD'10 C. Faloutsos 44

CMU SCS Always room for better models K=3 clusters? K=9 clusters? KDD'10 C. Faloutsos

CMU SCS Always room for better models K=3 clusters? K=9 clusters? KDD'10 C. Faloutsos 45

CMU SCS Always room for better models Piece-wise flat Mixture of (Gaussian) clusters KDD'10

CMU SCS Always room for better models Piece-wise flat Mixture of (Gaussian) clusters KDD'10 C. Faloutsos 46

CMU SCS Always room for better models Piece-wise flat ¾ Power law Mixture of

CMU SCS Always room for better models Piece-wise flat ¾ Power law Mixture of (Gaussian) clusters KDD'10 ? ? C. Faloutsos 47

CMU SCS Always room for better models Piece-wise flat ¾ Power law Mixture of

CMU SCS Always room for better models Piece-wise flat ¾ Power law Mixture of (Gaussian) clusters KDD'10 ONE, but Self-similar ‘cluster’ C. Faloutsos 48

CMU SCS Always room for better models • Barnsley’s method of IFS (iterated function

CMU SCS Always room for better models • Barnsley’s method of IFS (iterated function systems) can easily generate it [Barnsley+Sloan, BYTE, 1988] ONE, but Self-similar ‘cluster’ ~100 lines of C code: www. cs. cmu. edu~/christos/www/SRC/ifs. tar KDD'10 C. Faloutsos 49

CMU SCS Always room for better models • But, does self-similarity appear in real

CMU SCS Always room for better models • But, does self-similarity appear in real life? KDD'10 C. Faloutsos 50

CMU SCS Real, self similar dataset KDD'10 C. Faloutsos 51

CMU SCS Real, self similar dataset KDD'10 C. Faloutsos 51

CMU SCS Real, self similar dataset KDD'10 C. Faloutsos 52

CMU SCS Real, self similar dataset KDD'10 C. Faloutsos 52

CMU SCS Real, self similar dataset KDD'10 C. Faloutsos 53

CMU SCS Real, self similar dataset KDD'10 C. Faloutsos 53

CMU SCS Real, self similar dataset KDD'10 C. Faloutsos 54

CMU SCS Real, self similar dataset KDD'10 C. Faloutsos 54

CMU SCS • the red is true • origin: Norway • but most other

CMU SCS • the red is true • origin: Norway • but most other coastlines are ‘self-similar’, too! KDD'10 C. Faloutsos 55

CMU SCS How can we find better models? • Obviously, an art (‘undecidable’!) •

CMU SCS How can we find better models? • Obviously, an art (‘undecidable’!) • Helps if we – Listen to domain experts and – Listen to the data (next) KDD'10 C. Faloutsos 56

CMU SCS Outline • Credit where credit is due • Technical part – Data

CMU SCS Outline • Credit where credit is due • Technical part – Data mining – Can it be automated? NO! – Research challenges • Listen to the data (the more, the better!) • Non-technical part: `Listen’ – To the data – To non-experts KDD'10 C. Faloutsos 57

CMU SCS Scalability • Google: > 450, 000 processors in clusters of ~2000 processors

CMU SCS Scalability • Google: > 450, 000 processors in clusters of ~2000 processors each [Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003] • Yahoo: ~5 Pb of data [Fayyad’ 07] • ‘M 45’: 4 K proc’s, 3 Tb RAM, 1. 5 Pb disk KDD'10 C. Faloutsos 58

CMU SCS Promising research direction: scalability • challenges – Vast amounts of data; storing;

CMU SCS Promising research direction: scalability • challenges – Vast amounts of data; storing; cooling (!); … • … and opportunities: – DATA: Easier to collect (clickstreams, sensors etc) – S/W: Hadoop, hbase, pig, … : open source – H/W: 1 Tb disk: ~ US$ 100 KDD'10 C. Faloutsos 59

CMU SCS Promising research direction • The more data, the more subtle patterns we

CMU SCS Promising research direction • The more data, the more subtle patterns we may discover • Examples of subtle patterns: KDD'10 C. Faloutsos 60

CMU SCS More data, more subtle patterns PDF: fraction of customers (log scale) Duration

CMU SCS More data, more subtle patterns PDF: fraction of customers (log scale) Duration (log scale) Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal KDD'10 C. Faloutsos 61 distributions. KDD 2008: 596 -604

CMU SCS More data, more subtle patterns PDF: fraction of customers (log scale) (mixture

CMU SCS More data, more subtle patterns PDF: fraction of customers (log scale) (mixture of) Gaussians Duration (log scale) Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal KDD'10 C. Faloutsos 62 distributions. KDD 2008: 596 -604

CMU SCS More data, more subtle patterns Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean

CMU SCS More data, more subtle patterns Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal KDD'10 C. Faloutsos 63 distributions. KDD 2008: 596 -604

CMU SCS More data, more subtle patterns Zipf (Pareto, Power-law) Mukund Seshadri, Sridhar Machiraju,

CMU SCS More data, more subtle patterns Zipf (Pareto, Power-law) Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal KDD'10 C. Faloutsos 64 distributions. KDD 2008: 596 -604

CMU SCS More data, more subtle patterns Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean

CMU SCS More data, more subtle patterns Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal KDD'10 C. Faloutsos 65 distributions. KDD 2008: 596 -604

CMU SCS More data, more subtle patterns lognormal Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan,

CMU SCS More data, more subtle patterns lognormal Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal KDD'10 C. Faloutsos 66 distributions. KDD 2008: 596 -604

CMU SCS More data, more subtle patterns Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean

CMU SCS More data, more subtle patterns Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal KDD'10 C. Faloutsos 67 distributions. KDD 2008: 596 -604

CMU SCS More data, more subtle patterns d. Pln (=doubly Pareto Lognormal) Mukund Seshadri,

CMU SCS More data, more subtle patterns d. Pln (=doubly Pareto Lognormal) Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec: Mobile call graphs: beyond power-law and lognormal KDD'10 C. Faloutsos 68 distributions. KDD 2008: 596 -604

CMU SCS So, d. Pln is the answer? KDD'10 C. Faloutsos 69

CMU SCS So, d. Pln is the answer? KDD'10 C. Faloutsos 69

CMU SCS So, d. Pln is the answer? Yes, for the moment… KDD'10 C.

CMU SCS So, d. Pln is the answer? Yes, for the moment… KDD'10 C. Faloutsos 70

CMU SCS So, d. Pln is the answer? With more data, who knows? !

CMU SCS So, d. Pln is the answer? With more data, who knows? ! KDD'10 C. Faloutsos 71

CMU SCS Outline • Credit where credit is due • Technical part – Data

CMU SCS Outline • Credit where credit is due • Technical part – Data mining – Can it be automated? NO! – Research challenges • Listen to the data (the more, the better!) • Non-technical part: ‘Listen’ – To the data – To non-experts KDD'10 C. Faloutsos 72

CMU SCS Listen to non-experts • Explain ‘why’, to a non-expert (‘grandpa’) • (and,

CMU SCS Listen to non-experts • Explain ‘why’, to a non-expert (‘grandpa’) • (and, even harder, explain ‘how’ – e. g. : – Frobenious Perron for irreducible MC KDD'10 C. Faloutsos 73

CMU SCS Listen to non-experts • Explain ‘why’, to a non-expert (‘grandpa’) • (and,

CMU SCS Listen to non-experts • Explain ‘why’, to a non-expert (‘grandpa’) • (and, even harder, explain ‘how’ – e. g. : – Frobenious Perron for irreducible MC -> page. Rank -> random surfer KDD'10 C. Faloutsos 74

CMU SCS Summary • Data mining = compression = undecidable = job security •

CMU SCS Summary • Data mining = compression = undecidable = job security • Hence: always room for better models/patterns – Listen to the data (Gb, Tb and Pb of them!) – Listen to domain experts (e. g. , ¾ Kleiberg’s law) • Listen to non-experts (‘explain to grandpa’) KDD'10 C. Faloutsos 75

CMU SCS Compression, fun, recursion • The shortest, recursive joke: • There are 3

CMU SCS Compression, fun, recursion • The shortest, recursive joke: • There are 3 types of data miners KDD'10 C. Faloutsos 76

CMU SCS Compression, fun, recursion • The shortest, recursive joke: • There are 3

CMU SCS Compression, fun, recursion • The shortest, recursive joke: • There are 3 types of data miners – Those who can count KDD'10 C. Faloutsos 77

CMU SCS Compression, fun, recursion • The shortest, recursive joke: • There are 3

CMU SCS Compression, fun, recursion • The shortest, recursive joke: • There are 3 types of data miners – Those who can count – And those who can not KDD'10 C. Faloutsos 78

CMU SCS Thank you! For the honor, and for making this wonderful research community

CMU SCS Thank you! For the honor, and for making this wonderful research community KDD'10 C. Faloutsos 79