The IT Perspective Data Warehousing Management and Analytical

  • Slides: 33
Download presentation
The IT Perspective: Data Warehousing, Management, and Analytical Structures Rafal Lukawiecki Strategic Consultant, Project

The IT Perspective: Data Warehousing, Management, and Analytical Structures Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli. com 1 1

Objectives Explain the basics of: 1. 2. 3. 4. Master Data Management Data Warehousing

Objectives Explain the basics of: 1. 2. 3. 4. Master Data Management Data Warehousing ETL OLAP/Multidimensional Data This seminar is based on a number of sources including a few dozen of Microsoft-owned presentations, used with permission. Thank you to Chris Dial, Tara Seppa, Aydin Gencler, Ivan Kosyakov, Bryan Bredehoeft, Marin Bezic, and Donald Farmer with his entire team for all the support. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE. 2 2

ANALYZE REPORT STEAWARDSHIP MANAGE INTEGRATE 3 3

ANALYZE REPORT STEAWARDSHIP MANAGE INTEGRATE 3 3

SQL Services – Why? Install only the ones you need Which? Integration Services Get

SQL Services – Why? Install only the ones you need Which? Integration Services Get your data from the world outside (ETL) Analysis Services Cubes, Data Mining, support for Power. Pivot on Share. Point Reporting Services DIY Report Builder and traditional “big” reports Master Data Services Quality of critical master data (cities, colours, customers) Database Engine Data warehouse and OLTP relational storage 4 4

Master Data Management 5

Master Data Management 5

MDM Ensures consistency of data across all organisational uses Impacts overall data quality Processes

MDM Ensures consistency of data across all organisational uses Impacts overall data quality Processes and tools for: Collection, aggregation, matching, distribution, and persistence of master data Consistently Related to Federated Data Management Key to MDM: Modelling 6 6

Why MDM? It’s About Evolution of Enterprise Architecture Business Process Architecture • Rapid change

Why MDM? It’s About Evolution of Enterprise Architecture Business Process Architecture • Rapid change Application Architecture • Follows Business 7 Information Architecture Technology Architecture • Transcends • Longest life • Unexpected change 7

MDM Processes • Batched Acquisition from Staging Tables • Members, Attributes, Parent-Child Relationships •

MDM Processes • Batched Acquisition from Staging Tables • Members, Attributes, Parent-Child Relationships • SQL Integration Services Import & Integration 8 Modeling • Versioning Changes • Auditing • Compliance • Tracking of Instances • Subscription Views • Export to: • Operational Systems • Data Warehouses • BI Analytics • Reporting Tools Export & Subscription 8

Microsoft Master Data Services SQL 2008 R 2 Enterprise, Datacenter, Developer Tools: Master Data

Microsoft Master Data Services SQL 2008 R 2 Enterprise, Datacenter, Developer Tools: Master Data Manager Primary tool for managing your master data MDS Configuration Manager IT Pro tool MDS Web Service For developers wanting to extend MDS 9 Concepts: Models Entities Attributes Members Hierarchies Collections Versions Database 9

Modelling Master Data Model organises data at highest level Allowing versioning of changes to

Modelling Master Data Model organises data at highest level Allowing versioning of changes to data There are typically four categories of models: People (Customers, Staff) Places (Geographies, Cities, Countries) Things (Products) Concepts (Accounts, Behaviours, Transactions) 10 10

Example: Product MDM Model Product (model) Product (entity) Name (freeform attr) Code (freeform attr)

Example: Product MDM Model Product (model) Product (entity) Name (freeform attr) Code (freeform attr) Subcategory (domainbased attr) Standard. Cost (free-form attr) Name (freeform attr) Code (freeform attr) Category (domainbased attr) Name (freeform attr) 11 List. Price (free -form attr) Photo (file attr) Code (freeform attr) 11

1. Reviewing a Data Model Using Master Data Services 12

1. Reviewing a Data Model Using Master Data Services 12

Data Warehouse 13

Data Warehouse 13

Rich Connectivity Data Providers ODBC SQL Server SAP Net. Weaver BI SQL Server Integration

Rich Connectivity Data Providers ODBC SQL Server SAP Net. Weaver BI SQL Server Integration Services DB 2 My. SAP SQL Server Report Server Models Teradata XML OLE DB SQL Server Data Mining Models Oracle SQL Server Analysis Services 14 Hyperion Essbase 14

Star Schema 15 15

Star Schema 15 15

Star Schema Benefits Simple, not-so-normalized model High-performance queries Especially with Star Join Query Optimization

Star Schema Benefits Simple, not-so-normalized model High-performance queries Especially with Star Join Query Optimization Mature and widely supported Low-maintenance 16 16

Snowflake Dimension Tables Define hierarchies using multiple dimension tables Support fact tables with varying

Snowflake Dimension Tables Define hierarchies using multiple dimension tables Support fact tables with varying granularity Simplify consolidation of heterogeneous data Potential for slower query performance in relational reporting No difference in performance in Analysis Services database 17 17

Slowly Changing Dimensions Maintain historical context as dimension data changes Three common ways (there

Slowly Changing Dimensions Maintain historical context as dimension data changes Three common ways (there are more): Type 1: Overwrite the existing dimension record Type 2: Insert a new ‘versioned’ dimension record Type 3: Track limited history with attributes 23 23

Integration and ETL 27

Integration and ETL 27

Let’s do ETL with SSIS SQL Server Integration Services (SSIS) service SSIS object model

Let’s do ETL with SSIS SQL Server Integration Services (SSIS) service SSIS object model Two distinct runtime engines: Control flow Data flow 32 -bit and 64 -bit editions 28 28

The Package The basic unit of work, deployment, and execution An organized collection of:

The Package The basic unit of work, deployment, and execution An organized collection of: Connection managers Control flow components Data flow components Variables Event handlers Configurations Can be designed graphically or built programmatically Saved in XML format to the file system or SQL Server 29 29

Control Flow Control flow is a process-oriented workflow engine A package contains a single

Control Flow Control flow is a process-oriented workflow engine A package contains a single control flow Control flow elements Containers Tasks Precedence constraints Variables 30 30

Data Flow The Data Flow Task Performs traditional ETL and more Fast and scalable

Data Flow The Data Flow Task Performs traditional ETL and more Fast and scalable Data Flow Components Extract data from Sources Load data into Destinations Modify data with Transformations Service Paths Connect data flow components Create the pipeline 31 31

1. Using SQL Server Integration Services for Splitting Data 32

1. Using SQL Server Integration Services for Splitting Data 32

OLAP/Multidimensional Data 33

OLAP/Multidimensional Data 33

Cube = Unified Dimensional Model Multidimensional data Combination of measures and dimensions as one

Cube = Unified Dimensional Model Multidimensional data Combination of measures and dimensions as one conceptual model Measures are sourced from fact tables Dimensions are sourced from dimension tables 34 34

Hierarchies Benefits View of data at different levels of summarization Path to drill down

Hierarchies Benefits View of data at different levels of summarization Path to drill down or drill up Implementation Denormalized star schema dimension Normalized snowflake dimension Self-referencing relationship 36 36

Hierarchy Defined in Analysis Services Ordered collection of attributes into levels Navigation path through

Hierarchy Defined in Analysis Services Ordered collection of attributes into levels Navigation path through dimensional space Very important to get right! Customers by Geography Customers by Demographics Country Marital State Gender City Customer 37 37

Measure Group of measures with same dimensionality Analogous to a fact table Cube can

Measure Group of measures with same dimensionality Analogous to a fact table Cube can contain more than one measure group E. g. Sales, Inventory, Finance Defined by dimension relationships 38 38

Measure Group Dimension Sales 39 Inventory Customers X Products X X Time X X

Measure Group Dimension Sales 39 Inventory Customers X Products X X Time X X Promotions X Warehouse Finance X X Department X Account X Scenario X 39

1. Using BIDS to Review Dimension 2. 42 Design Cube Design and Functionality

1. Using BIDS to Review Dimension 2. 42 Design Cube Design and Functionality

Summary As a platform for enterprise Business Intelligence you should consider four services: Data

Summary As a platform for enterprise Business Intelligence you should consider four services: Data Warehouse (can be relational) Process for Data Management (MDS) Process for Data Integration (ETL) Analysis (OLAP, Data Mining, Columnar) = SQL Server 2008 R 2 43 43

© 2010 Microsoft Corporation & Project Botticelli Ltd. All rights reserved. The information herein

© 2010 Microsoft Corporation & Project Botticelli Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Project Botticelli and/or Rafal Lukawiecki. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Portions © 2010 Project Botticelli Ltd & entire material © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Project Botticelli Ltd as of the date of this presentation. Because Project Botticelli & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Project Botticelli cannot guarantee the accuracy of any information provided after the date of this presentation. Project Botticelli makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE. 44 44