Data Capture Master Data Management Data Quality Data

Big Data London Background • Data models need to be created whenever a new

Big Data London What is data modeling • Data modelling usually refers to the

Big Data London Subject Areas • A data model covering all of the information

Big Data London Conceptual Data Model – Sales Order Placement example Can buy Customer

Big Data London Logical Data Model – Product Group Identifier Product Group Name Product

Big Data London Physical Data Model – Product PG 1 A PRD_GRP_ID PRD_GRP_NM PRD_GRP_DESC

Big Data London How do you start to produce data models for a business?

Big Data London What modelling techniques can you use? • There are 2 common

Big Data London FAQ Why do we need to model data when we’ve bought

Big Data London Tool Selection Criteria Weighting Score Is there a ready supply of

Big Data London Tool Selection • Data Modelling tools are relatively mature, and research

Slides: 12

Download presentation

Data Capture Master Data Management Data Quality Data Governance Data Security Data Modelling Data Integration Business Glossary Metadata Management Data Migration Reporting Data Lifecycle Management BIG DATA LONDON Information Architecture 1 June 2015 Big Data London Data Modelling Recommendations

Big Data London Background • Data models need to be created whenever a new database is created to store data associated with a particular system, and also in order that business and IT can understand the data that is needed for a particular business process, data integration or reporting • This slide deck explains the practical purpose of each type of data model and modelling technique in order to encourage adoption within delivery projects and architecture • The slide deck does not aim to provide detailed information as to how to produce logical or physical data models, for which information and training courses are widely available. INFORMATION ARCHITECTURE 1 JUNE 2015 2

Big Data London What is data modeling • Data modelling usually refers to the production of a diagram which presents a set of data objects (entities) and the relationships between them. • There are 3 levels of data models that are usually produced, typically grouped by subject area (see next slide): - o Conceptual data model – This contains just the names of the entities. Relationships between the entities are often omitted at this level. The purpose of a conceptual model is that it’s easy to produce on a sheet of A 3/A 4 for discussion with the business, whereas the next level down includes attributes making diagrams more complex to read. o Logical data model – This takes the conceptual data model and adds fields (attributes) which contain all of the data that is associated with each entity. A logical data model is system-agnostic so is useful where you want to produce a detailed design for data which is sourced from multiple systems and/or for reporting. The entity attributes in a logical data model are usually defined in a business glossary. o Physical data model – This is the translation of the logical data model in to a model which will be used to create the tables which physically store data within a particular database in the most efficient manner. Another purpose is to allow a technical team to see relationships between physical tables, so that they can integrate or query data. The table columns in a physical data model are usually defined in a data dictionary. INFORMATION ARCHITECTURE 1 JUNE 2015 3

Big Data London Subject Areas • A data model covering all of the information stored by an organisation would be extremely complex to read even at the conceptual data model level. For this reason, we need to split data models in to subject areas. • A practical choice for a subject area is at a reasonable business capability/process level. For example, if a company has level 1 business capabilities of Product Development, Marketing, Sales, Finance etc. then splitting the data model by these subject areas allows teams who work in those areas to more easily validate a data model • A Level 1 business capability is typically still at too high a level, however, as there a lot of business processes which are incorporated within each of these areas. Working with your business users it is best to split a data model in to subject areas where a business process owner can easily verify that the data that has been modelled for 1 or more business processes is accurate. A conceptual data model is not of practical use, if a business user, working in the subject area that the data model is designed for, cannot verify that the entities are complete and the relationships correct. INFORMATION ARCHITECTURE 1 JUNE 2015 4

Big Data London Conceptual Data Model – Sales Order Placement example Can buy Customer Can request Date Product Can appear on Can be recorded for Sales Order Can list Sales Order Items Note: Data model for illustration purposes only INFORMATION ARCHITECTURE 1 JUNE 2015 Entity names should be meaningful to most business users. Where there are multiple names in use within a company, then synonyms/related terms can be listed in the business glossary Arrow heads can indicate whether it’s 1 or many Solid lines can indicate a relationship must exists Dashed lines can indicate a relationship may exist 5

Big Data London Logical Data Model – Product Group Identifier Product Group Name Product Group Description Integer PK Varchar (20) Varchar (100) More entities may be added at this stage as more information Is supplied about what needs to be recorded & reported upon Attributes are added to each of the entities described in the conceptual data model with their datatypes e. g. character, integer Primary keys (unique identifying attributes) are identified as well as Foreign keys (attributes which join the entity to another entity) Data Governance concepts e. g. Domains (standard attribute datatypes), default values and validation rules can be added to the design. Attribute descriptions are recorded in a Business Glossary Product Identifier Product Name Product Description Product Group Identifier Integer PK Varchar(20) Varchar (100) Integer FK Note: Data model section for illustration purposes only INFORMATION ARCHITECTURE 1 JUNE 2015 6

Big Data London Physical Data Model – Product PG 1 A PRD_GRP_ID PRD_GRP_NM PRD_GRP_DESC Integer PK Varchar (20) Varchar (100) Entities are converted in to physical tables and attributes in to Columns There may be more or less physical tables than corresponding entities, if there are storage or performance benefits Table and Column Names are often abbreviated based on Naming Standards provided via Data Governance. Tablespace: TS_MSTRS Tables are allocated to physical storage e. g. tablespaces Indexes can be added. Database specific objects e. g. physical storage, data security, Sequences, views can be added P 1 B PRD_ID PRD_NM PRD_DESC PRD_GRP_ID Integer PK Varchar(20) Varchar (100) Integer FK Tablespace: TS_MSTRS Once a physical data model is complete, a database-specific script (known as a DDL) can be created which a DBA runs in order to create the database objects (tables, indexes, sequences, views etc) Column descriptions are recorded in a data dictionary Note: Data model section for illustration purposes only INFORMATION ARCHITECTURE 1 JUNE 2015 7

Big Data London How do you start to produce data models for a business? • There are 2 basic approaches: o Top-Down – This approach is suitable where you’re adding new functionality to the business for which a database does not already exist, or the existing database is suboptimal. In this approach, you can start either with a blank sheet of paper and work with the business on identifying entities & relationships for a particular subject area or if the subject area is well defined, for example, payments, then you can use a reference data model that’s been created by somebody else o Bottom-Up (Reverse engineering). This approach is suitable where you have a database which already has entities & relationships which you can rely upon. In this method, you import the database objects in to your modelling tool as a physical data model & then reverse engineer a logical data model & a conceptual data model from it. You then modify the data models e. g. to add functionality or standardise naming conventions. INFORMATION ARCHITECTURE 1 JUNE 2015 8

Big Data London What modelling techniques can you use? • There are 2 common modelling techniques: - o Entity-Relationship Modelling (For example, slide 5 is doing this). In this technique, you identify all of the relationships between all of the entities. This technique is typically used to model data in transactional systems or for operational reporting. It ensures that each data record is recorded only once, so you can ensure that data is always consistent. o Dimensional Modelling. This technique is used for analytical reporting purposes. You have a central fact table e. g. sales_order_facts which contains your business metrics and all entities which you wish to slice and dice those metrics by (known as dimensions) are joined to the fact table. The benefit of this technique is that there are substantially less joins in the data model, allowing data to be queried more efficiently than with E-R modelling. The downside is that you are reliant on a fact event taking place in order to establish relationships between dimensions. E. g. in the model below, there is no relationship between customer and a product until a sale has taken place. Note: Data model for illustration purposes only INFORMATION ARCHITECTURE 1 JUNE 2015 9

Big Data London FAQ Why do we need to model data when we’ve bought a pre-built system which we trust and our system experts know the data? • Business processes generally need to be tailored for each organisation. Since business processes need to at least be configured, the data that is associated to that bespoke business process should be modelled so that a business user wishing to produce an operational report can be sure that the data and relationships have been modelled correctly. • A business user wouldn’t typically want to directly query the physical tables in a pre-built system as it would be difficult to understand. By creating a semantic layer (based on a logical data model) you can provide a means for a business user to query the data in a more digestible fashion. • A business user may wish to produce reports which contain data from multiple systems. It makes sense again to create a semantic layer (based on a logical data model) which uses system agnostic terms which the business user can understand, rather than terms that a particular software vendor may wish to impose. • When integrating data from other systems, the integration architect has to have an understanding as to what attributes are available at source and where they should be populated in the target system. Mapping data to a standardised data object (based on a logical data model entity) makes data integration much easier. • When integrating data, the integration architect needs to know the order in which tables should be populated in order that a child table is only populated after a parent table. Both a logical & physical data model would assist in determining the correct sequencing. Conclusion: For integration with multiple systems and for reporting, a logical data model should definitely be produced. It is not necessary to produce a physical data model, where the database is provided by a vendor, but you should produce a document which maps logical entities & attributes to the underlying physical tables/columns. INFORMATION ARCHITECTURE 1 JUNE 2015 10

Big Data London Tool Selection Criteria Weighting Score Is there a ready supply of data modellers skilled with the tool Is the user interface well designed & intuitive? What is the total cost of ownership? Does it allow logical entity attributes to be integrated in to a business glossary and physical table columns to be integrated in to a data dictionary? Does it allow technical metadata to be stored in a metadata repository or to be shared with data quality, data governance & data integration tools? Can you store data models in a repository which would allow multiple data modellers to view/work with different subject areas at the same time? Are you able to create database specific objects e. g. views, indexes, sequences, data security etc. Can you impose policies provided by data governance e. g. naming standards, validation rules, domains, default values? Can you forward/reverse engineer a physical data model from/to database technologies used in Informa? Can you easily generate a report so that the information contained within the data model can be viewed offline/by users who don’t have direct access to the modelling tool? Note: Weightings and Scores should be in range from 0 to 10 with 10 representing a very important criteria (Weighting) and a tool which fully met that criteria (Score) INFORMATION ARCHITECTURE 1 JUNE 2015 11

Big Data London Tool Selection • Data Modelling tools are relatively mature, and research as to the best products are no longer conducted. Popular data modelling tools are: - o CA Erwin o ER/Studio o SAP Sybase Power Designer • As there are benefits in sharing key entities & their relationships, it is worth selecting a data modelling tool which is the most popular amongst the data modellers across your organisation. This should not take a long time, however as there a limited number of users and prices are relatively cheap. INFORMATION ARCHITECTURE 1 JUNE 2015 12