Data Mining Techniques Instructor Ruoming Jin Fall 2006

  • Slides: 12
Download presentation
Data Mining Techniques Instructor: Ruoming Jin Fall 2006 1

Data Mining Techniques Instructor: Ruoming Jin Fall 2006 1

Welcome! n Instructor: Ruoming Jin n n Homepage: www. cs. kent. edu/~jin/ Office: 264

Welcome! n Instructor: Ruoming Jin n n Homepage: www. cs. kent. edu/~jin/ Office: 264 MCS Building Email: jin@cs. kent. edu Office hour: Mondays and Wednesdays (10: 00 AM to 11: 00 AM) or by appointment 2

Overview n n Homepage: www. cs. kent. edu/~jin/datamining. html Time: 11: 00 -12: 15

Overview n n Homepage: www. cs. kent. edu/~jin/datamining. html Time: 11: 00 -12: 15 PM Monday and Wednesday Place: MSB 276 Prerequisite: none n Preferred: Database, AI, Machine Learning, Statistics, Algorithms, and Data Structures 3

Overview n n Textbook: Introduction to Data Mining – Pang-Ning Tan, Michael Steinbach, and

Overview n n Textbook: Introduction to Data Mining – Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley References n Data Mining --- Concepts and techniques , by Han and Kamber, n n Morgan Kaufmann, 2001. (ISBN: 1 -55860 -489 -8) Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN: 0 -262 -08290 -X) The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN: 0 -387 -95284 -5) n Mining the Web --- Discovering Knowledge from Hypertext Data, by Chakrabarti, Morgan Kaufmann, 2003. (ISBN: 1 -55860 -7544) 4

Overview n Grading scheme Paper Presentation and discussion 35% Project 50% Attendance and participation

Overview n Grading scheme Paper Presentation and discussion 35% Project 50% Attendance and participation 15% n n No homework No exam 5

Overview (Presentation) n Paper presentation n n One per student Research paper(s) n n

Overview (Presentation) n Paper presentation n n One per student Research paper(s) n n n List of recommendations (will be available by the end of second week) Your own pick (upon approval) Three parts n n Review of research ideas in the paper Debate (Pros/Cons) Questions and comments from audience Class participation: One question/comment per student 6

Overview (Presentation) n n Order of presentation: assigned by instructor The presentation will start

Overview (Presentation) n n Order of presentation: assigned by instructor The presentation will start from late October or early November You need make your choice and send it to me by Sep. 22 nd! You need submit three drafts before the final presentation n n First draft due on Oct. 14 th Second draft due on Oct. 21 th Final slides due one day before your presentation. I will provide feedback and suggestion for each draft Note that I do expect the complete presentation slides in the first draft 7

Overview (Project) n Project (due Dec 3 rd) n n One project: One or

Overview (Project) n Project (due Dec 3 rd) n n One project: One or Two students Some suggestion will be available shortly n n Checkpoints n n n n The project will focus on visualizing data mining algorithm. Proposal: title and goal (due Oct 7 th) Outline of approach (due Oct 7 th) Implementation (due Dec 3 rd) Evaluation (due Dec 3 rd) Documentation (duce Dec 10 rd) Each group will have a short presentation and demo (20 minutes) Each group will provide a five-page document on the project 8

Topics n n Scope: Data Mining Topics: n n n n n Association Rule

Topics n n Scope: Data Mining Topics: n n n n n Association Rule Sequential Patterns Graph Mining Clustering and Outlier Detection Classification and Prediction Regression Pattern Interestingness Dimensionality Reduction … 9

Topics n Applications n n n n Bioinformatics Web mining Text mining Visualization Financial

Topics n Applications n n n n Bioinformatics Web mining Text mining Visualization Financial data analysis Intrusion detection … 10

KDD References n Data mining and KDD (SIGKDD: CDROM) n n n Journal: Data

KDD References n Data mining and KDD (SIGKDD: CDROM) n n n Journal: Data Mining and Knowledge Discovery, KDD Explorations Database systems (SIGMOD: CD ROM) n n n Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc. Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc. AI & Machine Learning n n Conferences: Machine learning (ICML), AAAI, IJCAI, COLT (Learning Theory), etc. Journals: Machine Learning, Artificial Intelligence, etc. 11

KDD References n n n Statistics n Conferences: Joint Stat. Meeting, etc. n Journals:

KDD References n n n Statistics n Conferences: Joint Stat. Meeting, etc. n Journals: Annals of statistics, etc. Bioinformatics n Conferences: ISMB, RECOMB, PSB, CSB, BIBE, etc. n Journals: J. of Computational Biology, Bioinformatics, etc. Visualization n n Conference proceedings: CHI, ACM-SIGGraph, etc. Journals: IEEE Trans. visualization and computer graphics, etc. 12