ETL from the Trenches Using SSIS in the

  • Slides: 27
Download presentation
ETL from the Trenches: Using SSIS in the Real World Dave Fackler, BI Architect

ETL from the Trenches: Using SSIS in the Real World Dave Fackler, BI Architect davef@rollinghillsky. com

Complete the Evaluation Form & Win! • You could win a Dell Mini Netbook

Complete the Evaluation Form & Win! • You could win a Dell Mini Netbook – every day – just for handing in your completed form! Each session form is another chance to win! Pick up your Evaluation Form: • Within each presentation room • At the PASS Booth near registration area Drop off your completed Form: • Near the exit of each presentation room • At the PASS Booth near registration area Sponsored by Dell

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS solution • Design patterns • Package standards • ETL auditing and logging • Team development • Package deployments

ETL Case Study • Department of Veteran’s Affair • Largest healthcare organization in the

ETL Case Study • Department of Veteran’s Affair • Largest healthcare organization in the US • 128 major medical centers; numerous other locations • 5. 5 M patients (veterans) seen annually • 278 K employees across the enterprise • Data stored in “local” Vista systems • Data warehouse environment • Multiple data warehouse tiers • SQL Server 2008, SSIS 2008 • “Top down” and “bottom up” data feeds

REGION 1 REGION 2 VAMC RDC Location CDC Location REGION 4 Guam Philippines REGION

REGION 1 REGION 2 VAMC RDC Location CDC Location REGION 4 Guam Philippines REGION 3 Alaska Hawaii Puerto Rico

VA Data Warehouse Environment National Data Sources CDW Vista Systems FDW RDW VDW VDW

VA Data Warehouse Environment National Data Sources CDW Vista Systems FDW RDW VDW VDW VDW VDW VDW

ETL Case Study • ETL architecture • National data sources (“top down”) • Loaded

ETL Case Study • ETL architecture • National data sources (“top down”) • Loaded into CDW first • (Planned) Pushed down to RDW environments • Regional FDW “feeder” data (“bottom up”) • Loaded into local RDW environment first • Pulled up to CDW environment • VDW environments loaded from “parent” RDW • Provides “top down” and “bottom up” data • Standard package design patterns, practices • Standardized auditing and logging infrastructure

ETL Case Study • ETL challenges • Surrogate keys for dimensions/facts • • Keys

ETL Case Study • ETL challenges • Surrogate keys for dimensions/facts • • Keys assigned when rows created (CDW or FDW) Ranges of valid keys assigned to each environment Identity columns used to generate values dbcc checkident() and “set identity_insert” used to handle CDW loads • SSIS package execution • Packages execute closest to destination • Multiple SSIS servers running various packages • (Planned) Multi-server SQL Agent job management

VA Data Warehouse Environment National Data Sources CDW SSIS RDW FDW SSIS Vista Systems

VA Data Warehouse Environment National Data Sources CDW SSIS RDW FDW SSIS Vista Systems RDW FDW SSIS

ETL Case Study • ETL challenges • National data sources • • Mix of

ETL Case Study • ETL challenges • National data sources • • Mix of “full” loads and “incremental” loads Oracle sources (SSIS Oracle adapter rocks!) Flat-file sources (complex record layouts) (Planned) Retire these whenever possible • (Planned) Real-time ETL • Trying to determine requirements • May or may not be based on SSIS…

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS solution • Design patterns • Package standards • ETL auditing and logging • Team development • Package deployments

SSIS Solution • Design patterns • Dimension table packages • • One package for

SSIS Solution • Design patterns • Dimension table packages • • One package for each source/dimension combination Extract to staging table, transform/load from there Handle transformations in data flow via C# script Use Lookup transforms for related dimensions • (Planned) Create inferred members if needed • Use Table. Diff** component to find inserts, updates • Union inserts, updates into new staging table • Use T-SQL “merge” statement to merge inserts, updates from staging table into dimension table ** From www. sqlbi. eu

SSIS Solution • Design patterns • Fact table packages • One package for each

SSIS Solution • Design patterns • Fact table packages • One package for each source/fact combination • Uses same basic design pattern used for dimensions • Master packages • • • Orchestrate execution of two or more other packages Set variables that are used by child packages Create context of a “batch” for auditing/logging Called by SQL Agent jobs for scheduled execution In some cases, provide logic for looping, determining extraction windows, etc.

Package Design Patterns DEMO

Package Design Patterns DEMO

SSIS Solution • Package standards • Naming conventions • • Packages – <DW>. <Subject>.

SSIS Solution • Package standards • Naming conventions • • Packages – <DW>. <Subject>. <Table>. <Action>. dtsx Control flow tasks, containers – standard prefixes Data flow components – standard prefixes Connection managers -- <Database>. <Provider> • Standard package property settings • Standard package variables • Standard SSIS package logging

SSIS Solution • Package standards • Package configurations • Every package uses an indirect

SSIS Solution • Package standards • Package configurations • Every package uses an indirect XML configuration • Standard %ETLConfigs% environment variable on every SSIS server/developer workstation • Contains common variable values • Parent package configurations in dimension and fact packages get other variable values from master package • Liberal use of property expressions (BIDS Helper!) • Connection strings for connection managers • Variable values for query strings, table names • Task and data flow component properties

Package Standards DEMO

Package Standards DEMO

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS solution • Design patterns • Package standards • ETL auditing and logging • Team development • Package deployments

ETL Auditing and Logging • Defined set of tables for capturing details regarding package

ETL Auditing and Logging • Defined set of tables for capturing details regarding package execution, ETL results, etc. • Stored in ETL database • Tables exist in ETLAudit schema • Stored procedures used (called via Execute SQL tasks) to interact with tables • (Planned) Maybe put some in Event Handlers • All rows in DW tables tagged with ETLBatch. ID • Several SSRS reports used to review data

ETL Auditing and Logging DEMO

ETL Auditing and Logging DEMO

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS solution • Design patterns • Package standards • ETL auditing and logging • Team development • Package deployments

Team Development • Team Foundation Server 2008 • All SSIS projects stored in several

Team Development • Team Foundation Server 2008 • All SSIS projects stored in several team projects • All SSIS developers have TFS tools installed • Team Explorer 2008 • TFS 2008 Power Tools • Standard test/QA process • Developer creates/edits package locally; checks in to TFS • Developer uses his/her “workspace” on test SSIS server to test packages, make corrections; checks in to TFS • Package reviewed by another developer; labeled if ready for production release

Package Deployments • All packages deployed as files • Standard SSIS environments • \<server>ETLDevelopment

Package Deployments • All packages deployed as files • Standard SSIS environments • \<server>ETLDevelopment • . . Developer. A • . . Developer. B • \<server>ETLDeployment • . . Project 1 • . . Project 2 • \<server>ETLFiles • In production, SQL Agent job gets labeled packages from TFS on a nightly basis • (Planned) Embed TFS Change. Set Number in Version. Comments when getting packages

Team Development, Package Deployments DEMO

Team Development, Package Deployments DEMO

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS

Agenda • ETL case study • Data warehouse environment • ETL architecture/challenges • SSIS solution • Design patterns • Package standards • ETL auditing and logging • Team development • Package deployments

Complete the Evaluation Form & Win! • You could win a Dell Mini Netbook

Complete the Evaluation Form & Win! • You could win a Dell Mini Netbook – every day – just for handing in your completed form! Each session form is another chance to win! Pick up your Evaluation Form: • Within each presentation room • At the PASS Booth near registration area Drop off your completed Form: • Near the exit of each presentation room • At the PASS Booth near registration area Sponsored by Dell

Thank you for attending this session and the 2009 PASS Summit in Seattle --

Thank you for attending this session and the 2009 PASS Summit in Seattle -- and don’t forget that you can get the conference sessions DVD for $125!!