Data Capture Data Quality Master Data Management Data

  • Slides: 15
Download presentation
Data Capture Data Quality Master Data Management Data Governance Data Modelling Data Security Data

Data Capture Data Quality Master Data Management Data Governance Data Modelling Data Security Data Integration Business Glossary Metadata Management Data Migration Reporting Data Lifecycle Management BIG DATA LONDON Information Architecture 1 June 2015 Big Data London Overview

Big Data London Purpose of slide deck • The slide deck will explain: o

Big Data London Purpose of slide deck • The slide deck will explain: o The different components of information architecture (depicted as jigsaw pieces in a puzzle) and how they interrelate with each other o The deck will summarise key points of each part of the architecture. More detailed slide decks will be created for each component. o The overall aim of the deck is to provide sufficient information to technical staff about the discipline of information architecture, so that it can be established within each division. The knowledge is also useful when assessing the capabilities of vendor tools. o The information is presented as recommendations/guidelines. It is rare for all of the elements of information architecture to be present within an organisation, since cost & time to implement have to be weighted against the overall benefits to the business. INFORMATION ARCHITECTURE 1 JUNE 2015 2

Big Data London What is Information Architecture? Data Capture Data Quality Data Integration Data

Big Data London What is Information Architecture? Data Capture Data Quality Data Integration Data Migration • For a long time, the discipline of information architecture was seen as a pure IT function and referred to simply as data architecture. • The elements of Information Architecture which are usually implemented by IT with little business input are shown in green in the diagram. Master Data Management Data Governance Data Modelling Data Security Business Glossary Metadata Management Reporting Data Lifecycle Management • Since the late 90 s, some businesses realised that they could gain a competitive advantage by capturing better quality business data about their customers and either using it to cross-sell/up-sell products via or to sell the information to 3 rd parties. • It was gradually realised that in order to achieve this, there needed to be more business involvement in standardising names & definitions, managing important (master) data and driving data quality initiatives via stricter data governance. • This has led to data governance councils being formed to allow for MDM, data governance and business glossary contents to drive these initiatives. The broadening of the discipline has led to it now being called information architecture INFORMATION ARCHITECTURE 1 JUNE 2015 3

Big Data London What is meant by Data Capture/Acquisition? Data Quality Master Data Management

Big Data London What is meant by Data Capture/Acquisition? Data Quality Master Data Management Data Governance Data Modelling Data Security Data Integration Business Glossary Metadata Management INFORMATION ARCHITECTURE 1 JUNE 2015 Data Migration Reporting Data Lifecycle Management Data Capture • Data Capture/Acquisition covers technology and design patterns which are required to extract data from a source (database/file) and capture it to a staging area for ongoing processing. • Typically data is captured either in batch via file transfer or in near real time by extracting information from source database logs. • In order to avoid performance bottlenecks, message queues are often implemented between the source system and the staging area. • It is important that you also ensure that no data is lost from source and that sensitive data is masked during transit. 4

Big Data London What is meant by Data Quality? Data Integration Master Data Management

Big Data London What is meant by Data Quality? Data Integration Master Data Management Data Governance Data Modelling Data Security Business Glossary Data Migration Reporting Data Quality • Data Quality refers to the technical implementation of processes to : 1. Analyse (profile) data and capture metrics identifying how good the quality of the data is 2. Standardise the structure of data where there are multiple sources for that data 1. Clean data, if possible. For example, address information can be cleansed by using address verification services. Metadata Management Data Lifecycle Management 2. Validate or reject data records which do not contain sufficient information for the target system to accept. Note: When the business are involve in creating data quality rules, this is considered to be part of “data governance”. INFORMATION ARCHITECTURE 1 JUNE 2015 5

Big Data London What is meant by Data Integration? Data Migration Master Data Management

Big Data London What is meant by Data Integration? Data Migration Master Data Management Data Governance Data Modelling Data Security Business Glossary Metadata Management INFORMATION ARCHITECTURE 1 JUNE 2015 Reporting Data Integration • In the broader use of the word, data integration is all of the steps from capturing the data from source through to pushing the data in to the target system. • However, in Information Architecture terms, data integration specifically refers to technology & design patterns used to extract data from a source, transform it and load it in to the target. Data capture & data quality steps are excluded from this narrower definition of the term. Data Lifecycle Management 6

Big Data London What is meant by Data Migration? Data Migration • Data migration

Big Data London What is meant by Data Migration? Data Migration • Data migration refers to needing to move data in bulk from an old system to a new system. During data migration, typical considerations include: - Master Data Management Data Governance Data Modelling Data Security Business Glossary Metadata Management INFORMATION ARCHITECTURE 1 JUNE 2015 Reporting Data Lifecycle Management • Where to pull data from – it’s not always the original system of record, since it can be more convenient to pull data from a data warehouse which has consolidated data from multiple systems of record, should it hold all the necessary data at a transactional level • How much history to capture. For example, only transactional data for open sales orders might need to be migrated from an old system, as transactional data is only required for operational purposes whilst the sales order hasn’t completed. Care needs to be taken, however, that regulatory requirements to hold transactional data can still be met as well as any internal management reporting requirements. • How much change history to capture. This refers to whether you just want to migrate the current state of data in the source system or whether you wish to also migrate all of the changes that were made to that data over the years. Having change history is important for reporting things such as like for like sales, where you wish to report on how much revenue and profit has changed had the organisation remained in the same state as it was last year. This differentiates organic growth from growth via acquisition. 7

Big Data London What is meant by Master Data Management? Master Data Management •

Big Data London What is meant by Master Data Management? Master Data Management • In the broader sense of the term, master data management (mdm) can refer to all initiatives to ensure that master data is of good quality, so can encompass data quality, data governance & the development of a business glossary within that term. Data Governance Data Modelling Data Security Business Glossary Metadata Management INFORMATION ARCHITECTURE 1 JUNE 2015 Reporting Data Lifecycle Management • In the narrower sense of the term, mdm usually refers to a tool and business processes which alert a data owner that there are master data records whch need managing, and allows a data owner to manually clean master data records and map unknown source records to known master data records. 8

Big Data London What is meant by Data Governance? Data Governance • In the

Big Data London What is meant by Data Governance? Data Governance • In the broader sense of the term, data governance can refer to all of the tasks controlled by a data governance council. So can include master data management, data governance in it’s narrower sense and the development of the business glossary. In this situation, the terms MDM & data governance are often interchangeable. Business Glossary Data Modelling Data Security Metadata Management INFORMATION ARCHITECTURE 1 JUNE 2015 Reporting Data Lifecycle Management • In the narrower sense of the term, it refers to involving the business in the creation of policies, standards and rules to ensure that data is of good quality. Naming standards, Cleansing & Validation rules, policies on how to deal with poor data quality are covered by data governance. 9

Big Data London What is meant by a Business Glossary? Business Glossary • A

Big Data London What is meant by a Business Glossary? Business Glossary • A business glossary contains names of data objects, attributes (describing features) and measures (formulae) which are understood by the business. These are referred to as terms. • A business glossary should cover all of the terms used by the business data during their every day work, including reporting. Reporting Data Modelling Data Security Metadata Management Data Lifecycle Management • Since there are often many different terms used by different parts of the business to either refer to the same thing or similar things, a business glossary will allow any term to be attached to synonyms and related terms. • Terms can be grouped in to different hierarchical structures (posh words are taxonomies & ontologies) in order to allow a business glossary user to be able to find a particular term and where it sits within a classification hierarchy. • A business glossary can also contain information about distinct lists of values for particular data attributes. • A business glossary should not be confused with a data dictionary. A data dictionary is used by IT to capture information about the tables & columns used to physically store data. It’s purpose is to aid development. If a reporting architecture is correctly implemented, a business user should not need to know how data is physically stored within a particular system. INFORMATION ARCHITECTURE 1 JUNE 2015 10

Big Data London What is meant by Reporting? Reporting refers to: - 1. Statutory

Big Data London What is meant by Reporting? Reporting refers to: - 1. Statutory reporting decks 2. Operational reporting – which allows a business to do it’s daily activities 3. Analytical reporting – which allows management to use aggregated data to be able to monitor business activity on a more macro scale. 4. Data discovery & querying – allows an operational user to visually examine data. Data Modelling Data Security Metadata Management INFORMATION ARCHITECTURE 1 JUNE 2015 Data Lifecycle Management 11

Big Data London What is meant by Data Modelling? Data Modelling • Data Modelling

Big Data London What is meant by Data Modelling? Data Modelling • Data Modelling refers to conceptual, logical & physical modelling of data. • Conceptual data modelling simply captures data objects (entities) as they’re understood by the business. For example, customer, product, supplier are entities which the business refer to. At this stage, relationships between the entities may or may not be included in the model. The purpose of conceptual data models is that you can use them to more easily talk to the business without the complication of detail that the logical & physical data models introduce. Data Security Metadata Management Data Lifecycle Management • Logical data modelling adds in all of the detailed data fields (attributes/measures) and establishes all of the relationships. A logical data model is useful for front end development e. g. web forms or reporting. • Physical data modelling converts a logical data model in to a form that’s suited to a particular database. So specific database objects such as views, indexes, sequences, storage charactistics are added in this model, and table/column names may well be abbreviated due to restrictions on name lengths or to speed up development. INFORMATION ARCHITECTURE 1 JUNE 2015 12

Big Data London What is meant by Data Security? Data Security • Data security

Big Data London What is meant by Data Security? Data Security • Data security is a broad subject covering the need to ensure that only the right people can see the right data. Data security includes: - • User/role authentication i. e. assigning rights to create/read/update or delete (CRUD) data from objects Metadata Management Data Lifecycle Management • Data auditing – this is the capturing of CRUD operations done by users. It ensures that should data be lost or a database authentication has been breached, that we know who was responsible. • Data Masking – can refer to “on the fly” masking of sensitive data (often called data redaction) or the permanent masking of data required if you wish to migrate production data in to a non-production environment • Data encryption – In databases that have highly sensitive data, as well as providing users with passwords, data is encrypted throughout it’s lifecycle, and is decrypted on the fly by use of encryption keys INFORMATION ARCHITECTURE 1 JUNE 2015 13

Big Data London What is meant by Metadata Management? Metadata Management • This refers

Big Data London What is meant by Metadata Management? Metadata Management • This refers to the capture of information about the structure of data (metadata) held in the source system, target system and at each stage as it goes through the data capture, quality integration phases right through until it’s consumed by an application or report. • The metadata is held in a metadata repository and allows data lineage (tracing of data from target to source or vice versa) to be achieved. Data lineage is useful when a consumer of a report, say, wishes to verify from which source, data in a particular field originates from. Data Lifecycle Management INFORMATION ARCHITECTURE 1 JUNE 2015 Note: Metadata management remains relatively elusive. The main reason for this is that most tools use proprietary metadata which varies from version to version. Other tools will typically only support metadata integration for popular versions of the most popular tools in any particular area of information management. Even buying all your tools from a single vendor does not overcome this issue, as many tools supplied by a single vendor weren’t originally developed inhouse and it can take several years for their metadata to become fully integrated. 14

Big Data London What is meant by Data Lifecycle Management? Data Lifecycle Management •

Big Data London What is meant by Data Lifecycle Management? Data Lifecycle Management • Data Lifecycle management is typically considered after a database has gone live. In the early days all data can easily be stored in a new database without impacting performance. However, as more data is added, performance gets worse and the need for data lifecycle management surfaces. • In the past, your only option was to archive data off to tape and hold it in a tape archive with requests for retrieval of data in terms of days or weeks. • Nowadays, data that is kept in the original database and is needed for operational purposes is referred to as “hot” data. • Once data is no longer needed to support the majority of daily activities, the data can be moved to cheaper online storage outside of the original database but still accessible. This is known as “warm” data. Solutions nowadays allow the differentiation between “hot” and “warm” data to be invisible to the end user, other than from a relative degradation in performance when accessing the “warm” data. • Once data is no longer needed other than for retention purposes such as regulatory requirements, then it can be moved to traditional tape archive solutions. Data in a tape archive is called “cold” data. INFORMATION ARCHITECTURE 1 JUNE 2015 • Data lifecycle management refers to implementing data retention & archiving policies to move data from hot to warm 15 to cold environments