Proses ETL ExtractTransformLoad Data Warehousing and Data Mining
- Slides: 13
Proses ETL (Extract-Transform-Load) Data Warehousing and Data Mining
Proses ETL Ekstrak (Extract) – Capture – Scrub atau data cleansing Transform Load dan Index ETL = Extract, transform, and load © Prentice Hall, 2002 2
Gambar 1. Langkah rekonsiliasi data STAGING AREA Capture = extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Incremental extract = mengambil Static extract = mengambil datahanya data-data yang mengalami data dari sumber pada waktu perubahan akibat static extract tertentu, dan biasanya hanya dilakukan sekali di awal proses. © Prentice Hall, 2002 3
Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Scrub = cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors: salah ejaan, tanggal juga: decoding, reformatting, time yang salah, penggunaan kolom yang salah, alamat yang tidak cocok, data yang hilang, data ganda, inkonsistensi stamping, konversi, key generation, penggabungan, deteksi error, pencarian data hilang © Prentice Hall, 2002 4
Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Transform = convert data from format of operational system to format of data warehouse Record-level: Selection – pemisahan data Joining – penggabungan data Aggregation – peringkasan data Field-level: single-field – dari one field ke one field multi-field – dari many fields ke one, atau one field ke many © Prentice Hall, 2002 5
Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Load/Index= place transformed data into the warehouse and create indexes Refresh mode: penulisan berulang Update mode: hanya perubahan- data tujuan secara massal dan berkala perubahan pada data sumber yang dimasukkan ke data warehouse © Prentice Hall, 2002 6
Figure 11 -11: Single-field transformation Secara umum – beberapa fungsi transformasi memindahkan data dari old form ke new form Algorithmic transformation menggunakan sebuah formula atau ekspresi logika Table lookup – pendekatan lain © Prentice Hall, 2002 7
Figure 11 -12: Multifield transformation M: 1 – dari banyak fields ke one target field 1: M – dari one field ke many target fields © Prentice Hall, 2002 8
Karakteristik Data setelah Proses ETL Terperinci Historical Perspektif pada Enterprise Timely 3 rd NF atau lebih Komprehensif Data secara periodik Ternormalisasi Data terperinci, tidak sekedar peringkasan Up-to-date (tidak harus real-time) Quality Controlled Kualitas yang baik
Derived Data Tujuan Mempermudah penggunaan aplikasi pendukung keputusan Respon yang cepat terhadap permintaan pengguna yang telah ditetapkan Data yang telah disesuaikan untuk pihak-pihak tertentu Dukungan untuk permintaan pelaporan Kemampuan untuk di Data mining Sifat Terperinci (data periodic) Ringkas (untuk penyimpulan) Most common data model = star bagian-bagian schema Terdistribusi (untuk layanan tertentu) (also called “dimensional model”) © Prentice Hall, 2002 10
Figure 11 -13: Components of a star schema Fact tables contain factual or quantitative data Dimension tables are denormalized to maximize performance 1: N relationship between dimension tables and fact tables Dimension tables contain descriptions about the subjects of the business Excellent for ad-hoc queries, but bad for online transaction processing © Prentice Hall, 2002 11
Figure 11 -14: Contoh Star schema Fact table provides statistics for sales broken down by product, period and store dimensions © Prentice Hall, 2002 12
Figure 11 -15: Star schema with sample data © Prentice Hall, 2002 13
- Data mining in data warehouse
- Datamart olap
- Olap data mining
- Introduction to data mining and data warehousing
- Mining complex data types
- Mining multimedia databases
- Data integration in data preprocessing
- Difference between strip mining and open pit mining
- Web text mining
- Hadoop hive architecture
- Data warehouse design best practices
- Data services etl
- Omop data model
- Tahapan utama proses data mining