Metadata room 2 room 3 garage room 1

  • Slides: 55
Download presentation
Metadata room 2 room 3 garage room 1 kitchen The Information Blueprint 1 Chuck

Metadata room 2 room 3 garage room 1 kitchen The Information Blueprint 1 Chuck Kelley Excellence In Data, LLC

Definition of Metadata �Metadata is �Data about Data �The map of the Data Warehouse

Definition of Metadata �Metadata is �Data about Data �The map of the Data Warehouse �Defines the construction, health and descriptive information �Metadata is not �The data itself �Master data �External data (depending on the type of data!) 2 Chuck Kelley Excellence In Data, LLC

Management of Data Metadata: the information “yellow pages” Hmmm, I wonder what this information

Management of Data Metadata: the information “yellow pages” Hmmm, I wonder what this information really What information is available? What does it mean? means? How was it derived? What was its source? How current is it? Who uses it? How often is it used? 3 Chuck Kelley Excellence In Data, LLC

Types of Metadata 4 Chuck Kelley Excellence In Data, LLC

Types of Metadata 4 Chuck Kelley Excellence In Data, LLC

Types of Metadata �Technical �Business �Contextual 5 Chuck Kelley Excellence In Data, LLC

Types of Metadata �Technical �Business �Contextual 5 Chuck Kelley Excellence In Data, LLC

Technical Metadata �Technical metadata is data about data needed by the technology folks to

Technical Metadata �Technical metadata is data about data needed by the technology folks to do their work correctly. This includes the "good ole days" metadata, but adds much more. Technical metadata is used by the IT side to understand how the data warehouse/data mart was constructed. �What is the system of record for a specific piece of data, �What transformations were performed on what source data to produce data in the data warehouse/data mart, �What are the columns in the data warehouse/data mart and what do they mean, �What is used to reconcile the data with the source system, and �When was the last date and time the data was loaded into the data warehouse/data mart. 6 Chuck Kelley Excellence In Data, LLC

Business Metadata �Business metadata is data about data needed by the business community to

Business Metadata �Business metadata is data about data needed by the business community to do their work better. Business metadata is used by the business to understand what is available in the data warehouse/data mart and how, in their terminology, is it built. Business metadata include �Who is the data steward, �What is the confidence level of the data and its quality, �What algorithm is used to create the values, �What is the definition of this data, and �What reports are available. 7 Chuck Kelley Excellence In Data, LLC

Contextual Metadata � Contextual metadata is data that sets "context" of your data. It

Contextual Metadata � Contextual metadata is data that sets "context" of your data. It really isn't metadata in the typical sense of the word, but is classified as metadata nonetheless. Examples of contextual metadata are � Weather reports, � Headlines of the day, and � Social, economic, and political issues. � Contextual metadata is the hardest to collect. � Possible sources are newswire feeds (like AP, Wall Street Journal, Christian Science Monitor), Internet sites (http: //www. weather. com, http: //www. wsj. com), or just plain manual input (which is probably the least desirable). � How does contextual metadata help? � Let's say that your organization is the Department of Energy and you noticed a major jump in spending on security during the late 1990 s. Now, in 2009, the spending seems to be trending downward. How do you know why that might be happening? During the late 1990 s (see I don't remember the year or the name of the person already!), there was believed to be some breach of security and that classified data was being "stolen". If that information was captured, then when the trend is discovered, we could look at the context of what was happening in the late 1990 s to see if it can help understand the trend. 8 Chuck Kelley Excellence In Data, LLC

Importance of Metadata 9 Chuck Kelley Excellence In Data, LLC

Importance of Metadata 9 Chuck Kelley Excellence In Data, LLC

Importance of Metadata room 2 room 3 garage room 1 kitchen Data Warehouse META

Importance of Metadata room 2 room 3 garage room 1 kitchen Data Warehouse META DATA Metadata provides the blueprint of the Data Warehouse 10 Chuck Kelley Excellence In Data, LLC

Importance of Technical Metadata �Serves the IT community with operational detail about information systems.

Importance of Technical Metadata �Serves the IT community with operational detail about information systems. However �Metadata is not the primary focus of IT �Looked upon as a documentation exercise of minimal value �Often relegated to “nice to have” status 11 Chuck Kelley Excellence In Data, LLC

Importance of Business Metadata �Serves the Business Community as a source to discover what

Importance of Business Metadata �Serves the Business Community as a source to discover what and where information exists �Business meaning takes precedence over technical detail (look for commonality) �Looked upon as a key source for knowledge on Operational processes 12 Chuck Kelley Excellence In Data, LLC

Importance of Metadata �Serves the Data Warehouse as a key enabler �Of primary importance

Importance of Metadata �Serves the Data Warehouse as a key enabler �Of primary importance to DSS Analysts �Metadata is critical to tracking the content and validity of data in the Warehouse �Provides context to the data Issue: “Knowledge Gap” between OLTP and DW can affect the success of Data Warehousing Implementations 13 Chuck Kelley Excellence In Data, LLC

Importance of Metadata �What it does �Describes data in operational systems which facilitates �

Importance of Metadata �What it does �Describes data in operational systems which facilitates � mapping data elements to the DW � data conversion � aggregation & summarization logic � coordinating naming conventions � managing anomalies between physical characteristics of common data across information systems 14 Chuck Kelley Excellence In Data, LLC

Importance of Metadata �Metadata provides a new dimension �It allows � � � Data

Importance of Metadata �Metadata provides a new dimension �It allows � � � Data to be managed over time ( 5 - 10 years) Data to be managed by context (business meaning and business value will change over time) Manages structural changes to the DW database (versioning of metadata) �Allows Operational Systems to reinvent themselves by discovering corporate data which exists across systems 15 Chuck Kelley Excellence In Data, LLC

Where Do You Find Metadata 16 Chuck Kelley Excellence In Data, LLC

Where Do You Find Metadata 16 Chuck Kelley Excellence In Data, LLC

Metadata Exists Everywhere! External Sources Operational Data Sources Database Definitions File Definitions COBOL Copybooks

Metadata Exists Everywhere! External Sources Operational Data Sources Database Definitions File Definitions COBOL Copybooks Metadata Manager Data Modeling Tool Data Dictionary Data Warehouse Data Model Physical Database Definitions Summarization DSS Tool Business Catalog Data Extraction Tool Data Dictionary And this is just part of technical metadata… 17 Internal Non-Operational Data Sources DSS Tool Data Definitions Chuck Kelley Excellence In Data, LLC

18 Data Access Business Rules / Derived Measures Data Aggregation Data Warehouse Data Model

18 Data Access Business Rules / Derived Measures Data Aggregation Data Warehouse Data Model Interface, Transformation, and Load Metadata and the Data Warehouse Data Quality Data Warehouse Operations Chuck Kelley Excellence In Data, LLC

Interface, Transform, and Load �Describes the interface location and content. �Describes information about the

Interface, Transform, and Load �Describes the interface location and content. �Describes information about the transformation from source system codes to reference data codes. �Describes information about custom transformation, such as, using subsets of the data. �Describes information about the Data Warehouse destination 19 Chuck Kelley Excellence In Data, LLC

ETL Metadata Points Data Sources Filter Cleanse Extract Data Warehouse Transform Log/QA 20 Chuck

ETL Metadata Points Data Sources Filter Cleanse Extract Data Warehouse Transform Log/QA 20 Chuck Kelley Excellence In Data, LLC

Information Sourcing Activities Activity Description Outcomes Extract Pull Data from Operational Systems Raw Data

Information Sourcing Activities Activity Description Outcomes Extract Pull Data from Operational Systems Raw Data Filter Discard “Noise” data from data set Dirty Data Cleanse Analyze data quality and make corrections Clean Data Transform Rearrange and Summarize Data Useful Data Log/QA Perform Final Check and Build “Yellow Pages” Verified Data Metadata 21 Chuck Kelley Excellence In Data, LLC

DW Data Model �A description of each attribute and entity of the data model.

DW Data Model �A description of each attribute and entity of the data model. �This is an extract from the CASE tool that manages the data model or has been the output of a data dictionary. 22 Chuck Kelley Excellence In Data, LLC

Aggregation �Rules based engine for Aggregation. �States which fields from which DW tables are

Aggregation �Rules based engine for Aggregation. �States which fields from which DW tables are combined and the algorithm that aggregates the data. �Used to create code or to suggest stored procedures for aggregation. 23 Chuck Kelley Excellence In Data, LLC

Data Access - Reporting �Report Generation Metadata �A rules based reporting tool that describes

Data Access - Reporting �Report Generation Metadata �A rules based reporting tool that describes a report format from the header to column and rows. �Report Menu Metadata �Describes the reports that are available. �Describes the Menu that the user is shown for accessing the available reports. 24 Chuck Kelley Excellence In Data, LLC

Data Access - Query �Defines canned queries available. �Defines public and private queries. �Allows

Data Access - Query �Defines canned queries available. �Defines public and private queries. �Allows queries to be combined. 25 Chuck Kelley Excellence In Data, LLC

Data Access - End User Data Warehouse Data Model Interface, Transformation, and Load End

Data Access - End User Data Warehouse Data Model Interface, Transformation, and Load End User Application Help files Derived Business Measures 26 Chuck Kelley Excellence In Data, LLC

Data Quality �DW Load Statistics �The use of control numbers from the source system,

Data Quality �DW Load Statistics �The use of control numbers from the source system, compared to the load data. �DW Quality Rules �Rules that tracked known data trends for report checking 27 Chuck Kelley Excellence In Data, LLC

Metadata Components 28 Chuck Kelley Excellence In Data, LLC

Metadata Components 28 Chuck Kelley Excellence In Data, LLC

Metadata Components �Storage in the Warehouse �Operational Mapping Storage �Extract History �Volumetrics �Algorithms �Relationship

Metadata Components �Storage in the Warehouse �Operational Mapping Storage �Extract History �Volumetrics �Algorithms �Relationship History Ownership �Ownership/Stewardship �External/Reference Data �Business Meaning (Data Models) 29 Mapping History Algorithms Relationships Reference Data Models Chuck Kelley Excellence In Data, LLC

Metadata Components �Storage Requirements �Database Schema �Table Spaces �Database Tables(Dimensions, Facts) �Keys and Indexes

Metadata Components �Storage Requirements �Database Schema �Table Spaces �Database Tables(Dimensions, Facts) �Keys and Indexes �Facts (Attributes) �Information Access (Data Topology) � � PC’s EIS DSS Operational Storage Mappin History Volumetrics Algorithms Relationships Ownership Reference Data Models 30 Chuck Kelley Excellence In Data, LLC

Metadata Components �Operational Mapping �Location of data sources �Data Element conversion Mapping � Physical

Metadata Components �Operational Mapping �Location of data sources �Data Element conversion Mapping � Physical characteristic conversions � naming changes � default values Storage History Volumetrics Algorithm Relationships � encoding �Data Key changes Ownership Reference Data Models �Logic & Algorithms 31 Chuck Kelley Excellence In Data, LLC

Metadata Components �Extract History �Logged history of data extracts and transformations �Audit logs �Job

Metadata Components �Extract History �Logged history of data extracts and transformations �Audit logs �Job Scheduling (Batch, On-line) History Storage Mapping Volumetrics Algorithms Relationships Ownership Reference Data Models 32 Chuck Kelley Excellence In Data, LLC

Metadata Components �Volumetrics �Number of Tables �Number of Rows �Usage Characteristics �Table Indexing �Aging

Metadata Components �Volumetrics �Number of Tables �Number of Rows �Usage Characteristics �Table Indexing �Aging Criteria Storage Mapping History Volumetrics Algorithms Relationships Ownership Reference Data Models 33 Chuck Kelley Excellence In Data, LLC

Metadata Components �Algorithms �Levels of Summarization �Criteria applied to Data Aggregation �Data Derivation Algorithms

Metadata Components �Algorithms �Levels of Summarization �Criteria applied to Data Aggregation �Data Derivation Algorithms Storage Volumetrics Mapping History Relationships Ownership Reference Data Models 34 Chuck Kelley Excellence In Data, LLC

Metadata Components �Relationship History �Relationship Artifacts �Relationship History Relationships � Tables included Storage �

Metadata Components �Relationship History �Relationship Artifacts �Relationship History Relationships � Tables included Storage � Effective Dates � Constraints in Effect Mapping History Volumetrics Algorithms � Cardinality in Effect � Description Ownership Reference Data Models 35 Chuck Kelley Excellence In Data, LLC

Metadata Components �Ownership/Stewardship �Operational Ownership � Updates � Recovery Storage � Accuracy �Data Warehouse

Metadata Components �Ownership/Stewardship �Operational Ownership � Updates � Recovery Storage � Accuracy �Data Warehouse Stewardship � Data consistency � Loading � Access 36 Mapping History Volumetrics Algorithms Relationships Reference Data Models Ownership Chuck Kelley Excellence In Data, LLC

Metadata Components �External/Reference Data �Location, type and content of external data �Encoded values and

Metadata Components �External/Reference Data �Location, type and content of external data �Encoded values and changes �Audit log of changes Storage �Date/Time stamps Mapping History Volumetrics Algorithms Relationships Ownership Data Models Reference 37 Chuck Kelley Excellence In Data, LLC

Metadata Components �Business Meaning (Data Models) �Data Warehouse Data Model (Logical) �Mapping to Data

Metadata Components �Business Meaning (Data Models) �Data Warehouse Data Model (Logical) �Mapping to Data Warehouse Database Design (Physical) Mapping History Storage �Mapping to Operational Systems Data Models (Corporate & Business Area) Volumetrics Algorithms Relationships �Mapping to other DW architecture Metadata Reference Ownership � EIS/DSS � Data Mining/Data Journalism Data Models 38 Chuck Kelley Excellence In Data, LLC

How to use Metadata 39 Chuck Kelley Excellence In Data, LLC

How to use Metadata 39 Chuck Kelley Excellence In Data, LLC

How to Use Metadata �Metadata Manager Requirements �Required features include: � GUI � Data

How to Use Metadata �Metadata Manager Requirements �Required features include: � GUI � Data Model Management Storage Mapping History Volumetrics Algorithms Relationships � Model/Data Versioning � Data Access & Security � Integration with the DW DBMS Ownership Reference Data Models � Integration with DW Architecture � Unstructured Reference Data Management (futures) 40 Chuck Kelley Excellence In Data, LLC

How to Use Metadata However. . . �Most current Repositories are not extensible �Few

How to Use Metadata However. . . �Most current Repositories are not extensible �Few specialized tools are available for � Metadata Mining (Data Re-engineering) �There is no standard way to exchange metadata � between various Meta Manager tools � between EIS/DSS tool sets � between OLTP DBMS and DW DBMS �OLTP CASE repositories which manage business models are not geared for Data Warehousing 41 Chuck Kelley Excellence In Data, LLC

How to use Metadata �How to support it (Metadata Maintenance) �Care and feeding of

How to use Metadata �How to support it (Metadata Maintenance) �Care and feeding of Metadata is just as important as the data itself �Other Considerations. �How to get IT and the Business Client to use metadata Have a single point of contact � Always do it at their terminal (DW or Client) � Always let them do it with your help � 42 Chuck Kelley Excellence In Data, LLC

What To Look For in a Product 43 Chuck Kelley Excellence In Data, LLC

What To Look For in a Product 43 Chuck Kelley Excellence In Data, LLC 43

What to Look for in Products �From David Marco’s book Building and Managing the

What to Look for in Products �From David Marco’s book Building and Managing the Meta Data Repository: A Full Lifecycle Guide 44 Chuck Kelley Excellence In Data, LLC 44

Vendor Background � Full name and business address of vendor. � Parent Company. �

Vendor Background � Full name and business address of vendor. � Parent Company. � Number of years company has been in business. � Company structure. Is it a corporation, partnership, or privately held? List names associated with structure if different from question # 1. � Public or privately held company. If public, which exchange is company traded on, and what is the company's market symbol? � When did the company go public, or when is it expected to go public? � Total number of employees worldwide? � Total number of U. S. employees? � Web site URL � Number of developers supporting proposed product solution? � Company profit/loss for the last three years (if available). 45 Chuck Kelley Excellence In Data, LLC 45

Proposed Solution Overview � Summary of the vendor's proposed solution and explain how it

Proposed Solution Overview � Summary of the vendor's proposed solution and explain how it meets the needs � What are the names and versions of the product(s) component(s) comprising the vendor's proposed solution? � The repository architect and infrastructure architect need to carefully review all the components in the proposed solution and compare them with the target technical environment and support structure. How do the components communicate? What hardware platforms, DBMS's, Web servers and communications protocols do the components require? How is security and migration handled among the various components? � Number of worldwide production installations using precisely this proposed solution configuration. � � Be sure to consider the hardware, DBMS, Web server, etc. How many other companies are using the same confiruration? Is your company going to be the first? � What hardware, operating system, DBMS and web browser limitations do each of the product(s) component(s) have in the proposed solution on client and server platforms? � Be mindful of any requirements to download. Java applets and/or Active. X controls to the client. This might be in conflict with your company's web policy or if deployed externally your clients. � What is the release date and version number history of each of the product(s) component(s) for the past 24 months � What was the anticipated release date and new feature list for each of the product(s) features and component(s) over the next 12 months? 46 Chuck Kelley Excellence In Data, LLC 46

Cost of Solution �Total cost of proposed solution. �Cost of consulting services required for

Cost of Solution �Total cost of proposed solution. �Cost of consulting services required for installation. �Negotiate consulting time up front to complete staff training and get the repository up and running as quickly as possible. �Cost of consulting services for initial project setup �What is the vendor's daily rate for consulting services without expenses? �Annual maintenance cost/fee �This should range any where from 14 percent to 18 percent of solution price? �Are all new product component releases/upgrades provided while under an annual maintenance agreement? If not, please explain in detail. 47 Chuck Kelley Excellence In Data, LLC 47

Technical Requirements � Are there any database schema design requirements for the DSS data

Technical Requirements � Are there any database schema design requirements for the DSS data model in order to function with the repository product? � Does the proposed solution require a change in the existing DSS schema design in order to function? � How does the tool control the various versions of the meta data (development, quality assurance and production) stored in the repository? � How is meta data from multiple DSS projects controlled and separated? How can the various projects share meta data? � The answer to this question will determine how you administer the product and provide security. � Describe how meta data repository contents are migrated from one system engineering phase to the next (development, quality assurance and production)? How does this processing sequence differ when dealing with multiple projects on various time lines? � In particular how is meta data migrated through the various design phases? Can a single project or portion of a project be migrated forward? How? � What DBMS privileges does the product support (e. g. , roles, accounts, and views)? � Can DBMS-specific SQL statements be incorporated into queries? 48 Chuck Kelley Excellence In Data, LLC 48

Technical Requirements �Describe the security model used with the product? �How does the product

Technical Requirements �Describe the security model used with the product? �How does the product use existing infrastructure security systems? �Does the product use any type of single sign-on authentication (e. g. , LDAP)? �Where are user security constraints for the product stored? �Can a user have access to the repository tool for one project but no access for another project? �Can a user view the SQL generated by the product? �Is the product Web-enabled? Describe. Chuck Kelley 49 Excellence In Data, LLC 49

Implementation �Describe sequence of events and level of effort recommended for clients to consider

Implementation �Describe sequence of events and level of effort recommended for clients to consider in planning their implementation strategy. �What is the typical duration of the implementation cycle? �How many DSS database schema dimensions and facts can the proposed product solution handle? �Provide a sample project plan for implementation of your proposed solution for a single DSS project. �What client resource skill sets need to be in place for installation and implementation? 50 Chuck Kelley Excellence In Data, LLC 50

And Lastly, but very important �Obtain from vendor at least three customers references 51

And Lastly, but very important �Obtain from vendor at least three customers references 51 Chuck Kelley Excellence In Data, LLC 51

Top Five Mistakes with Metadata 52 Chuck Kelley Excellence In Data, LLC 52

Top Five Mistakes with Metadata 52 Chuck Kelley Excellence In Data, LLC 52

Top Five Mistakes of Metadata 1. 2. 3. 4. 5. Not defining the Objectives

Top Five Mistakes of Metadata 1. 2. 3. 4. 5. Not defining the Objectives of the Metadata Purchasing the tool before the requirements Choosing the tool before an evaluation Making Metadata to hard to utilize Not understanding the effort of Metadata 53 Chuck Kelley Excellence In Data, LLC 53

Conclusions �Metadata is a critical component of any data warehouse �Information users must learn

Conclusions �Metadata is a critical component of any data warehouse �Information users must learn how to use it. �Learn from others mistakes 54 Chuck Kelley Excellence In Data, LLC

Chuck Kelley � 30+ year professional in dealing with Data �chuckkelley@usa. net � 480

Chuck Kelley � 30+ year professional in dealing with Data �chuckkelley@usa. net � 480 -797 -5850 “I never metadata I didn’t like” 55 Chuck Kelley Excellence In Data, LLC