Proses ETL ExtractTransformLoad Data Warehousing and Data Mining

  • Slides: 13
Download presentation
Proses ETL (Extract-Transform-Load) Data Warehousing and Data Mining

Proses ETL (Extract-Transform-Load) Data Warehousing and Data Mining

Proses ETL Ekstrak (Extract) – Capture – Scrub atau data cleansing Transform Load dan

Proses ETL Ekstrak (Extract) – Capture – Scrub atau data cleansing Transform Load dan Index ETL = Extract, transform, and load © Prentice Hall, 2002 2

Gambar 1. Langkah rekonsiliasi data STAGING AREA Capture = extract…obtaining a snapshot of a

Gambar 1. Langkah rekonsiliasi data STAGING AREA Capture = extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse Incremental extract = mengambil Static extract = mengambil datahanya data-data yang mengalami data dari sumber pada waktu perubahan akibat static extract tertentu, dan biasanya hanya dilakukan sekali di awal proses. © Prentice Hall, 2002 3

Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Scrub = cleanse…uses pattern

Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Scrub = cleanse…uses pattern recognition and AI techniques to upgrade data quality Fixing errors: salah ejaan, tanggal juga: decoding, reformatting, time yang salah, penggunaan kolom yang salah, alamat yang tidak cocok, data yang hilang, data ganda, inkonsistensi stamping, konversi, key generation, penggabungan, deteksi error, pencarian data hilang © Prentice Hall, 2002 4

Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Transform = convert data

Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Transform = convert data from format of operational system to format of data warehouse Record-level: Selection – pemisahan data Joining – penggabungan data Aggregation – peringkasan data Field-level: single-field – dari one field ke one field multi-field – dari many fields ke one, atau one field ke many © Prentice Hall, 2002 5

Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Load/Index= place transformed data

Figure 11 -10: Steps in data reconciliation (continued) STAGING AREA Load/Index= place transformed data into the warehouse and create indexes Refresh mode: penulisan berulang Update mode: hanya perubahan- data tujuan secara massal dan berkala perubahan pada data sumber yang dimasukkan ke data warehouse © Prentice Hall, 2002 6

Figure 11 -11: Single-field transformation Secara umum – beberapa fungsi transformasi memindahkan data dari

Figure 11 -11: Single-field transformation Secara umum – beberapa fungsi transformasi memindahkan data dari old form ke new form Algorithmic transformation menggunakan sebuah formula atau ekspresi logika Table lookup – pendekatan lain © Prentice Hall, 2002 7

Figure 11 -12: Multifield transformation M: 1 – dari banyak fields ke one target

Figure 11 -12: Multifield transformation M: 1 – dari banyak fields ke one target field 1: M – dari one field ke many target fields © Prentice Hall, 2002 8

Karakteristik Data setelah Proses ETL Terperinci Historical Perspektif pada Enterprise Timely 3 rd NF

Karakteristik Data setelah Proses ETL Terperinci Historical Perspektif pada Enterprise Timely 3 rd NF atau lebih Komprehensif Data secara periodik Ternormalisasi Data terperinci, tidak sekedar peringkasan Up-to-date (tidak harus real-time) Quality Controlled Kualitas yang baik

Derived Data Tujuan Mempermudah penggunaan aplikasi pendukung keputusan Respon yang cepat terhadap permintaan pengguna

Derived Data Tujuan Mempermudah penggunaan aplikasi pendukung keputusan Respon yang cepat terhadap permintaan pengguna yang telah ditetapkan Data yang telah disesuaikan untuk pihak-pihak tertentu Dukungan untuk permintaan pelaporan Kemampuan untuk di Data mining Sifat Terperinci (data periodic) Ringkas (untuk penyimpulan) Most common data model = star bagian-bagian schema Terdistribusi (untuk layanan tertentu) (also called “dimensional model”) © Prentice Hall, 2002 10

Figure 11 -13: Components of a star schema Fact tables contain factual or quantitative

Figure 11 -13: Components of a star schema Fact tables contain factual or quantitative data Dimension tables are denormalized to maximize performance 1: N relationship between dimension tables and fact tables Dimension tables contain descriptions about the subjects of the business Excellent for ad-hoc queries, but bad for online transaction processing © Prentice Hall, 2002 11

Figure 11 -14: Contoh Star schema Fact table provides statistics for sales broken down

Figure 11 -14: Contoh Star schema Fact table provides statistics for sales broken down by product, period and store dimensions © Prentice Hall, 2002 12

Figure 11 -15: Star schema with sample data © Prentice Hall, 2002 13

Figure 11 -15: Star schema with sample data © Prentice Hall, 2002 13