Data Mining Penelitian Data Mining Romi Satria Wahono

  • Slides: 19
Download presentation
Data Mining: Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono. net http: //romisatriawahono. net 0815

Data Mining: Penelitian Data Mining Romi Satria Wahono romi@romisatriawahono. net http: //romisatriawahono. net 0815 -86220090

Romi Satria Wahono § SD Sompok Semarang (1987) § SMPN 8 Semarang (1990) §

Romi Satria Wahono § SD Sompok Semarang (1987) § SMPN 8 Semarang (1990) § SMA Taruna Nusantara, Magelang (1993) § S 1, S 2 dan S 3 (on-leave) Department of Computer Sciences Saitama University, Japan (1994 -2004) § Research Interests: Software Engineering and Intelligent Systems § Founder Ilmu. Komputer. Com § Peneliti LIPI (2004 -2009) § Founder dan CEO PT Brainmatics Cipta Informatika

Course Outline 1. Pengenalan Data Mining 2. Proses Data Mining 3. Evaluasi dan Validasi

Course Outline 1. Pengenalan Data Mining 2. Proses Data Mining 3. Evaluasi dan Validasi pada Data Mining 4. Metode dan Algoritma Data Mining 5. Penelitian Data Mining

Penelitian Data Mining

Penelitian Data Mining

Penelitian Data Mining 1. Standard Proses Penelitian pada Data Mining 2. Journal Publications on

Penelitian Data Mining 1. Standard Proses Penelitian pada Data Mining 2. Journal Publications on Data Mining 3. Research on Classification 4. Research on Clustering 5. Research on Prediction 6. Research on Association Rule

Standard Proses Penelitian pada Data Mining

Standard Proses Penelitian pada Data Mining

Data Mining Standard Process (CRISP–DM) § A cross-industry standard was clearly required that is

Data Mining Standard Process (CRISP–DM) § A cross-industry standard was clearly required that is industry neutral, tool-neutral, and application-neutral § The Cross-Industry Standard Process for Data Mining (CRISP–DM) was developed in 1996 (Chapman, 2000) § CRISP-DM provides a nonproprietary and freely available standard process for fitting data mining into the general problem-solving strategy of a business or research unit

CRISP-DM

CRISP-DM

1. Business Understanding Phase § Enunciate the project objectives and requirements clearly in terms

1. Business Understanding Phase § Enunciate the project objectives and requirements clearly in terms of the business or research unit as a whole § Translate these goals and restrictions into the formulation of a data mining problem definition § Prepare a preliminary strategy for achieving these objectives

2. Data Understanding Phase § Collect the data § Use exploratory data analysis to

2. Data Understanding Phase § Collect the data § Use exploratory data analysis to familiarize yourself with the data and discover initial insights § Evaluate the quality of the data § If desired, select interesting subsets that may contain actionable patterns

3. Data Preparation Phase § Prepare from the initial raw data the final data

3. Data Preparation Phase § Prepare from the initial raw data the final data set that is to be used for all subsequent phases. This phase is very labor intensive § Select the cases and variables you want to analyze and that are appropriate for your analysis § Perform transformations on certain variables, if needed § Clean the raw data so that it is ready for the modeling tools

4. Modeling phase § Select and apply appropriate modeling techniques § Calibrate model settings

4. Modeling phase § Select and apply appropriate modeling techniques § Calibrate model settings to optimize results § Remember that often, several different techniques may be used for the same data mining problem § If necessary, loop back to the data preparation phase to bring the form of the data into line with the specific requirements of a particular data mining technique

5. Evaluation phase § Evaluate the one or more models delivered in the modeling

5. Evaluation phase § Evaluate the one or more models delivered in the modeling phase for quality and effectiveness before deploying them for use in the field § Determine whether the model in fact achieves the objectives set for it in the first phase § Establish whether some important facet of the business or research problem has not been accounted for sufficiently § Come to a decision regarding use of the data mining results

6. Deployment phase § Make use of the models created: Model creation does not

6. Deployment phase § Make use of the models created: Model creation does not signify the completion of a project § Example of a simple deployment: Generate a report § Example of a more complex deployment: Implement a parallel data mining process in another department § For businesses, the customer often carries out the deployment based on your model

Latihan § Pelajari dan pahami Case Study 1 -5 dari buku Larose (2005) Chapter

Latihan § Pelajari dan pahami Case Study 1 -5 dari buku Larose (2005) Chapter 1 § Pelajari dan pahami bagaimana menerapkan CRISP-DM pada tesis Firmansyah (2011) tentang penerapan algoritma C 4. 5 untuk penentuan kelayakan kredit

Journal Publications on Data Mining

Journal Publications on Data Mining

Transactions and Journals § Review Paper (survey and state-of-the-art): • ACM Computing Surveys (CSUR)

Transactions and Journals § Review Paper (survey and state-of-the-art): • ACM Computing Surveys (CSUR) § Research Paper (technical): • ACM Transactions on Knowledge Discovery from Data (TKDD) • ACM Transactions on Information Systems (TOIS) • IEEE Transactions on Knowledge and Data Engineering • Springer Data Mining and Knowledge Discovery • International Journal of Business Intelligence and Data Mining (IJBIDM)

Cognitive Assignment III 1. Baca 1 paper ilmiah yang diterbitkan di journal 2010 -2012

Cognitive Assignment III 1. Baca 1 paper ilmiah yang diterbitkan di journal 2010 -2012 yang berhubungan dengan metode data mining yang sudah kita pelajari 2. Rangkumkan masing-masing dalam bentuk slide dengan struktur: 1. Latar Belakang Masalah (Research Background) 2. Pernyataan Masalah (Problem Statements) 3. Pertanyaan Penelitian (Research Questions) 4. Tujuan Penelitian (Research Objective) 5. Metode-Metode yang Sudah Ada (Existing Methods) 6. Metode yang Diusulkan (Proposed Method) 7. Hasil (Results) 8. Kesimpulan (Conclusion) 3. Presentasikan di depan kelas pada mata kuliah berikutnya

Referensi 1. Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining: Practical Machine

Referensi 1. Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining: Practical Machine Learning Tools and Techniques 3 rd Edition, Elsevier, 2011 2. Daniel T. Larose, Discovering Knowledge in Data: an Introduction to Data Mining, John Wiley & Sons, 2005 3. Florin Gorunescu, Data Mining: Concepts, Models and Techniques, Springer, 2011 4. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques Second Edition, Elsevier, 2006 5. Oded Maimon and Lior Rokach, Data Mining and Knowledge Discovery Handbook Second Edition, Springer, 2010 6. Warren Liao and Evangelos Triantaphyllou (eds. ), Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications, World Scientific, 2007