Design Patterns for SSIS Performance Darren Green Konesans

  • Slides: 16
Download presentation
Design Patterns for SSIS Performance Darren Green Konesans Ltd

Design Patterns for SSIS Performance Darren Green Konesans Ltd

Performance • Utility packages vs specialised work processes • Less than 10% of packages

Performance • Utility packages vs specialised work processes • Less than 10% of packages require detailed performance design or tuning – It will take at least 90% of your time! – Use simple tweaks and hints ORDER Col for Cl Idx, MICS – More radical design changes? • Which way is faster? – It depends!

Lookups • Two main uses – Check if exists, no values are returned –

Lookups • Two main uses – Check if exists, no values are returned – Lookup data, returns one or more values to augment existing data • Performance Points – Size of reference data set • Number of rows • Size of row (especially with augmentation usage) – Size of input data set • Number of rows – Cache Type

Lookup Cache Types Characteristic Full Cache Partial None Size of reference dataset Small Lots

Lookup Cache Types Characteristic Full Cache Partial None Size of reference dataset Small Lots of rows or wide rows can take a long time to cache and use valuable memory Small Range of Large Cache built based on input values so a small range of values reduces the costly first lookup. Any Reference dataset has no impact on the lookup, but assumes an efficient query. Reference data far larger than (unique) input. Size of input dataset Any / Large Once cached the lookup is available for 1 million rows. Larger datasets justify the initial cost of cache population. Any Only sensible when input values are duplicated in dataset. Small / Sorted Small datasets work best as lookup cost is high, but last value is cached, so larger sorted datasets are also good Use when reference dataset is large but value range is small.

Demo 1 -4 Insert vs Update

Demo 1 -4 Insert vs Update

Insert vs Update Patterns • Insert or Update with Lookup – Use lookup to

Insert vs Update Patterns • Insert or Update with Lookup – Use lookup to test if exists, or return values – Standard and familiar pattern – Changing the Lookup cache type can be very helpful • Insert then Update – – Use constraint failures to direct rows Use multiple inserts to refine batch to single row Good for high Insert to update ratio, more inserts Use lookups without cache on update rows only, more efficient with large reference data sets • Merge Join and Conditional Split for Insert or Update – Must be sorted inputs

Insert vs Updates Performance • Test conditions – Single Virtual Machine • 3 GB

Insert vs Updates Performance • Test conditions – Single Virtual Machine • 3 GB Ram • Virtual Disk – Packages run in pseudo random order • 18 or 23 iterations per package – Input source of 2 million (narrow) rows – Existing rows • 100 K, 500 K, 1 M

Performance Numbers • Performance metrics are for the purpose of illustration only. The organism

Performance Numbers • Performance metrics are for the purpose of illustration only. The organism must be present in every case of the disease. The organism can be isolated from the diseased host and grown in pure culture. The specific disease is reproduced when a pure culture of the organism is inoculated into a healthy host. The organism must be recoverable from the experimentally infected host

Performance Test Results Insert to Update Ratio Lookup Insert then Update Merge Join 20:

Performance Test Results Insert to Update Ratio Lookup Insert then Update Merge Join 20: 1 67 96 69 4: 1 66 124 78 2: 1 80 334 89 Variation in Times 13% 3% • Uses only top 50% of results for each run • Variation means results quite unpredictable, even excluding outliers • Scale of test too small for real conclusions • External factors a big influence!

Demo 5 -7 Updates

Demo 5 -7 Updates

Update Methods • OLE-DB Command – Standard and familiar pattern in SSIS – Poor

Update Methods • OLE-DB Command – Standard and familiar pattern in SSIS – Poor user experience developing SQL and mapping parameters – Poor performance • T-SQL Update Statement – Requires additional staging table and space – Good performance

Update Performance Results OLE-DB Command T-SQL Update Time 123 82 Variation in Times 23%

Update Performance Results OLE-DB Command T-SQL Update Time 123 82 Variation in Times 23% 18% • 6 runs for each case in pseudo random order • 2 million rows, 500 K updates, 4: 1

Demo 8 -9 Distinct

Demo 8 -9 Distinct

Distinct Rows • Distinct 2 Million to 201 Rows – One integer column •

Distinct Rows • Distinct 2 Million to 201 Rows – One integer column • Sort Transformation – Remove Duplicate Values • Aggregate Transformation – Group By

Distinct Performance Results Sort Aggregate Time 54 51 Variation in Times 17% 2% •

Distinct Performance Results Sort Aggregate Time 54 51 Variation in Times 17% 2% • 6 runs for each case in pseudo random order • Variation very different • Is the sort transform more susceptible to outside influences?

Thank you • Questions? • Darren Green – darren@konesans. com – www. sqlis. com

Thank you • Questions? • Darren Green – darren@konesans. com – www. sqlis. com – www. sqldts. com • Feedback Online – http: //www. sqlbits. com/sessionfeedback