Data Warehouse IS 403 Chapter 2 Data warehouse

  • Slides: 26
Download presentation
Data Warehouse IS 403 Chapter 2 Data warehouse Architecture Dr Taleb Obaid 1

Data Warehouse IS 403 Chapter 2 Data warehouse Architecture Dr Taleb Obaid 1

Business Analysis Framework • The business analyst get the information from the data warehouses

Business Analysis Framework • The business analyst get the information from the data warehouses to measure the performance and make critical adjustments in order to win over other business holders in the market. • Having a data warehouse offers the following advantages: Dr Taleb Obaid 2

advantages 1. Since a data warehouse can gather information quickly and efficiently, it can

advantages 1. Since a data warehouse can gather information quickly and efficiently, it can enhance business productivity. 2. A data warehouse provides us a consistent view of customers and items, hence it helps us manage customer relationship. 3. A data warehouse also helps in bringing down the costs by tracking trends, patterns over a long period in a consistent and reliable manner. Dr Taleb Obaid 3

Business Analysis Framework • To design an effective and efficient data warehouse, we need

Business Analysis Framework • To design an effective and efficient data warehouse, we need to understand analyze the business needs and construct a business analysis framework. Different views regarding the design of a data warehouse as follows: The top-down view - This view allows the selection of relevant information needed for a data warehouse. The data source view - This view presents the information being captured, stored, and managed by the operational system. The data warehouse view - This view includes the fact tables and dimension tables. It represents the information stored inside the data warehouse. The business query view - It is the view of the data from the viewpoint of the end-user. Dr Taleb Obaid 4

Three-Tier Data Warehouse Architecture Data warehouses adopts three-tier architecture. 1. Bottom Tier - bottom

Three-Tier Data Warehouse Architecture Data warehouses adopts three-tier architecture. 1. Bottom Tier - bottom tier is the data warehouse database server. We use the back-end tools and utilities to feed data into the bottom tier. These backend tools and utilities perform the Extract, Clean, Load, and refresh functions. 2. Middle Tier - In the middle tier, we have the OLAP Server that can be implemented in either of the following ways. o By Relational OLAP (ROLAP), which is an extended relational database management system? The ROLAP maps the operations on multidimensional data to standard relational operations. o By Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional data and operations? 3. Top-Tier - This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools. Dr Taleb Obaid 5

The following diagram depicts the three-tier architecture of a data warehouse Data Warehouse Architecture

The following diagram depicts the three-tier architecture of a data warehouse Data Warehouse Architecture Dr Taleb Obaid 6

The following diagram depicts the three-tier architecture of a data warehouse Data Warehouse Architecture

The following diagram depicts the three-tier architecture of a data warehouse Data Warehouse Architecture Dr Taleb Obaid 7

Three-Tier Data Warehouse Architecture Operational Source Systems Operational systems are used to process everyday

Three-Tier Data Warehouse Architecture Operational Source Systems Operational systems are used to process everyday transactions of an organization The operational systems are designed in such a way that the transactions occur smoothly and the data-integrity is maintained efficiently The operational systems have very fast insert/update since minimal data is affected each time a transaction occurs In order to improve performance the old data is purged (removed) systematically Dr Taleb Obaid 8

ETL - Extraction, Transformation and Loading Extraction • The extraction methods depend on the

ETL - Extraction, Transformation and Loading Extraction • The extraction methods depend on the performance of the source system and the demands of the business. Full extraction is applied when the data is required to be retrieved and loaded the first time. Hence, represents the current data available in the source system Incremental extraction is a process where the differences in the source data since the last extraction are captured. Only the changes will be loaded based on the last changed timestamp Online extraction is a process where the data is extracted from the source system directly Offline extraction is a process of extraction where the source system is emptied into a flat file outside of the source. This flat file is used to extract the data Dr Taleb Obaid 9

Transformation and Loading Transformation The data is transformed based on the transformation rules provided

Transformation and Loading Transformation The data is transformed based on the transformation rules provided by the business. The data is converted to a standard format and common semantics Data cleansing is the process of distinguishing and correcting the discrepant data from a database or table. Data cleansing also involves the synchronization of data. For example, the compliance of Male/Female to M/F Loading Once the data is cleansed and transformed into a structure persistent with the data warehouse requisites, the data is then qualified to be loaded into a data warehouse Populating (filling) the data into the tables present in a data warehouse and verifying if the data is ready for use is the first step of loading After loading the facts and dimensions a DBA should check for referential integrity i. e. each record from the fact table should be related to a dimension record Dr Taleb Obaid 10

Data Staging Area Data Presentation Area The presentation area represents a collection of data

Data Staging Area Data Presentation Area The presentation area represents a collection of data marts. A data mart is a sub set of a data warehouse Data marts are preferred for smaller data volumes and fewer data sources. It enables easier data cleaning process Dependent data marts retrieve data from a central data warehouse whereas the independent data marts are standalone systems that extract data directly from the operational systems or external sources Dr Taleb Obaid 11

Data Staging Area Data Access Tools Business Intelligence tools are used for accessing the

Data Staging Area Data Access Tools Business Intelligence tools are used for accessing the data for strategic, operational, and analytical purposes Senior executives and managers access the data warehouse for taking critical decisions. They devise strategies and observe the business performance E. g. Balance Scorecards Operational managers execute the details of the strategies against the targets. E. g. Sales Forecasts Analytical operations are performed by analysts to evaluate the outcomes of a business process and understand the functioning of the business E. g. Financial and Sales Analysis Dr Taleb Obaid 12

Technical Architecture • Two-tier Architecture Two-tier architecture consists of a front-end client component and

Technical Architecture • Two-tier Architecture Two-tier architecture consists of a front-end client component and a back-end server component. Dr Taleb Obaid 13

Technical Architecture Two-tier Architecture Here the direct communication happens between client and server, there

Technical Architecture Two-tier Architecture Here the direct communication happens between client and server, there is no intermediate layer between client and server. The Two-tier architecture is divided into two parts: • Client Application (Client Tier) • Database (Data Tier) Advantages: 1. Easy to maintain and modification is bit easy. 2. Communication is faster. Disadvantages: 1. In two tier architecture application performance will be degrade upon increasing the users. 2. Cost-ineffective Dr Taleb Obaid 14

Technical Architecture • Three-tier architecture observes the presence of the three layers of software

Technical Architecture • Three-tier architecture observes the presence of the three layers of software – presentation, core application logic, and data and they exist in their own processors. This architecture is extensively used for data warehousing Dr Taleb Obaid 15

Technical Architecture Three-tier Architecture Typically comprise a presentation tier, a business or data access

Technical Architecture Three-tier Architecture Typically comprise a presentation tier, a business or data access tier, and a data tier, as follows: 1. Client layer: Represents Web browser, a Java or other application, Applet etc. The client tier makes requests to the Web server who will be serving the request by either returning static content if it is present in the Web server. Dr Taleb Obaid 16

Technical Architecture Three-tier Architecture Business layer: This layer provides the business services. This tier

Technical Architecture Three-tier Architecture Business layer: This layer provides the business services. This tier contains the business logic and the business data. All the business logic like validation of data, calculations, data insertion etc. Data layer: This layer is the external resource such as a database. This tier is also known as Data Tier. Data Access Layer contains methods to connect with database or other data source and to perform insert, update, delete, get data from data source based on our input data. Dr Taleb Obaid 17

Technical Architecture Client This refers to the data that reaches the end-users. This can

Technical Architecture Client This refers to the data that reaches the end-users. This can be in the form of tables or graphics or an email that is automatically generated everyday Application Server The application server consists of the following: Summarized data at different levels. For example, you can either retrieve data as independent transactions or as summaries by week, by year, by month, or by year Filtered data is used to limit the data in the report. For example, we can limit the values of a report based on a particular product by adding name of the product in the querying filter. The data is not modified but hidden at the report-level Metadata (show) is defined as the consolidated data that relates to the detailed data. Dr Taleb Obaid 18

Database Server The database server is mostly a relational database system. The data is

Database Server The database server is mostly a relational database system. The data is populated into the database using the ETL tools and utilities Inmon’s Top-Down Approach Dr Taleb Obaid 19

Database Server A centralized data warehouse acts as a enterprise-wide data warehouse from which

Database Server A centralized data warehouse acts as a enterprise-wide data warehouse from which data marts are built as per the requirements of the specific departments The data model is based on Entity Relationship Persistent dimensional views of data across data marts can be viewed since all data marts are loaded from a data warehouse This data warehouse design is efficient against all business changes. Creation of a data mart from a data warehouse is very simple The analytic systems can access data in a data warehouse via the data marts Dr Taleb Obaid 20

Database Server When it comes to designing a data warehouse for your business, the

Database Server When it comes to designing a data warehouse for your business, the two most commonly discussed methods are the approaches introduced by Bill Inmon and Ralph Kimball. Debates on which one is better and more effective have lasted for years. But a clear-cut answer has never been arrived upon, as both philosophies have their own advantages and differentiating factors, and enterprises continue to use either of these. In a nutshell, here are the two approaches: in Bill Inmon’s enterprise data warehouse approach (the top-down design), a normalized data model is designed first, then the dimensional data marts, which contain data required for specific business processes or specific departments, are created from the data warehouse. Dr Taleb Obaid 21

Database Server Kimball’s Bottom-Up Approach Dr Taleb Obaid 22

Database Server Kimball’s Bottom-Up Approach Dr Taleb Obaid 22

Database Server Kimball’s Bottom-Up Approach Kimball is a proponent of an approach to data

Database Server Kimball’s Bottom-Up Approach Kimball is a proponent of an approach to data warehouse design described as bottom-up in which dimensional data marts are first created to provide reporting and analytical capabilities for specific business areas such as “Sales” or “Production” Keeping in mind the most important business aspects or departments, data marts are created first. These provide a thin view into the organizational data and, as and when required, these can be combined into a larger data warehouse. Kimball defines data warehouse as “a copy of transaction data specifically structured for query and analysis”. Dr Taleb Obaid 23

Database Server A business process is built using data marts which are joined together

Database Server A business process is built using data marts which are joined together using common dimensions A dimensional data model with facts and dimensions is implemented here The reports can be generated quickly since the data marts are built first The data warehouse can be easily expandable to accommodate new units. It involves the creation of new data marts and then integrating with the other data marts The analytic systems access data via data marts Dr Taleb Obaid 24

Database Server Federated Data Warehouse A federated data warehouse integrates all the legacy data

Database Server Federated Data Warehouse A federated data warehouse integrates all the legacy data warehouses, business intelligence systems into a newer system that provides analytical functionalities The implementation time is of a shorter period compared to building a enterprise data warehouse Dr Taleb Obaid 25

Database Server Hub (center) and Spokes Architecture The architecture is based on an extensive

Database Server Hub (center) and Spokes Architecture The architecture is based on an extensive enterprise-level analysis of data requirements. The infrastructure that is built is expandable and maintainable The architecture can be developed in an iterative manner based on the subject area The dependent data marts that are developed later are based on the functional areas and may consist of normalized or deformalized data Dr Taleb Obaid 26