Helping Your Data Warehouse Succeed 10 Mistakes to
- Slides: 34
Helping Your Data Warehouse Succeed: 10 Mistakes to Avoid in Data Integration Rafael Salas t: @Raf. Salas w: www. Rafael-Salas. com e: rfsalas@yahoo. es
About Rafael • DW BI – 12 years • SQL Server MVP • Solution Architect - Quaero a CSG Systems Solution • Charlotte, NC • Lots mistakes along the way!
Mistakes are the portals of discovery. James Joyce
Today’s Plan • 10 mistakes to avoid • What, why, how to prevent them • Share real life examples • No magic formulas
1 Ignoring Data Realities …Or finding them too late.
1 The Problem • Relying on common knowledge: – The data is ‘good’ – I know this data well • We don’t have time • Cycle: Code Load Explode! • Research-Recode-Retest = Rework
1 The Fix Requirements Data Profiling Compare Clue: Business want ‘good’ quality data: • Accurate • Timeliness • Relevant • Complete • Understood • Trusted
1 Benefits • Early awareness about data quality issues • Better ETL development estimates • Uncover new business rules • Better understanding of business requirements
1 How? • 3 rd Party tools • Hand crafted SQL queries • SSIS: Data Profiling Task – – – Decent profiling Get up and running quickly SQL Server data sources only Output is XML Results can be loaded in a table – XSLT required
2 Exception Handling …Actually the lack thereof
2 The Problem • Data’s ‘Buts and Ifs’ nobody mentioned • Unreliable data sources • Missed homework: data profiling – Data type mismatches and overflow – Referential integrity • Cycle: Run Fail Patch
2 The Fix • Consider exceptions at different levels – Data/Database – Network – Operative System • Design a system-wide strategy – Design Patterns Templates • Log and notify!
2 How? • Data/Database: – In SSIS: Use dataflow error outputs to redirect offending rows • Network: – Pre-process: test connectivity – In SSIS: Event handlers, precedence constraints with conditional logic • O/S – Pre-process: Validate space available, File available, etc. – In SSIS: Event handlers, precedence constraints with conditional logic
3 Inadequate Logging …What, when, how?
3 The Problem • No/Little Logging • Too Much Logging • Meaningless Logging • • Error troubleshooting Execution monitoring Performance tracking Auditing
3 The Fix • Add logging capabilities – Start with key events, add more as needed – – Start – End date & Times Row Counts On Error On Warning • Create reports on top of logging tables • Don’t forget to clean/prune logs • Logging I/O are expensive
3 The Fix • SSIS logging • SSIS event handler • Be aware of the concept of containers in SSIS events ‘bubble-up’ • Have to be included on each package – Use package templates
4 No Recovery & Restart …Game Over!
4 • • • The Problem Re-starting after failure is not automated It requires manual clean-up of partial results Prone to human error May require to start process from the beginning Risk of ‘skipping’ data Risk of duplicating data
4 The Fix • Create restart-ability points • Consider piggybacking on logging • Use ternary logic at each recovery point: – Skip – Run – Clean-up and re-run • Staging source data is handy • Custom
5 Staging Area Unauthorized Use …could cause injuries.
5 The Problem • Failing to understand staging area is a ‘construction zone’ • Reports and applications accessing staging data • Using staging tables as on-line data archive
5 The Fix • Easy: Keep staging area off-limit • Make all required data in data presentation layer • Keep staging data available only for required time • Use appropriate data aging and archiving policies and processes
6 Performance: Losing the focus …
6 Very Fast, but…
7 Vanity Testing …good for feeling awesome.
8 No Portability …deployment in progress!
9 Forgetting the Owner’s Manual …aka the beloved documentation.
10 Missing the Bigger Picture …the architecture.
10 The problem • Jumping to coding without a blueprint • Break it down into group of tasks • List all tasks and functionality you can’t live without • Place the tasks in the appropriate group
10 The Fix • Create an attack plan • Embrace an architecture • Divide and conquer! • List all tasks and functionality you require • Place the tasks in the appropriate group
10 An example Extract Changed data capture Transform Load Data cleansing Data Load Deduplication Exception Handling Data Staging Data Error tracking Other Data Transformations Load Aggregates OLAP Cube Processing Other Post Load Actions ETL Management Job Scheduler Activity Monitor Backup Recovery Restart Alerting ABC Support Security Compliance
Helping Your Data Warehouse Succeed: 10 Mistakes to Avoid in Data Integration Rafael Salas t: @Raf. Salas w: www. Rafael-Salas. com e: rfsalas@yahoo. es
- Data warehouse optimization mistakes
- 23 helping verbs song
- Laugh at your mistakes but learn from them
- What is kdd process in data mining
- Contoh data warehouse dan data mart
- Data warehouse components
- Perbedaan data warehouse dan data mart
- Introduction to data warehouse
- Perbedaan data warehouse dan data mining
- Perbedaan data warehouse dan data mining
- Data warehousing data mining and olap
- What is data acquisition in data warehouse
- Data warehouse vs data mart
- Explain the three tier architecture of data warehouse
- Data warehouse dan data mining
- Data warehousing and data mining in crm
- Do you help your mother
- Political wish new
- Count read succeed
- How did the french revolution succeed
- Why did roanoke fail and jamestown succeed
- Locate adverb form
- Succeed prefix and suffix
- How to succeed
- The three world order
- Give us your hungry your tired your poor
- Visio sql server
- Collier data warehouse
- Sql server parallel data warehouse
- Populating data warehouse
- Epm data warehouse
- Olam in data warehouse
- Bislr medical groups
- Technical architecture data warehouse
- Data warehouse requirements gathering template