Unit4 Reporting Datawarehouse layers Different data warehousing systems

  • Slides: 40
Download presentation
Unit-4 (Reporting) Data-warehouse layers

Unit-4 (Reporting) Data-warehouse layers

 • Different data warehousing systems have different structures. Some may have an ODS

• Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system.

Layers of Data warehouse • In general, all data warehouse systems have the following

Layers of Data warehouse • In general, all data warehouse systems have the following layers: • Data Source Layer, Data Extraction Layer, Staging Area, ETL Layer, Data Storage Layer, Data Logic Layer, Data Presentation Layer, Metadata Layer, System Operations Layer. The picture below shows the relationships among the different components of the data warehouse architecture:

Data-warehouse reporting architecture

Data-warehouse reporting architecture

Data Source Layer This represents the different data sources that feed data into the

Data Source Layer This represents the different data sources that feed data into the data warehouse. The data source can be of any format -- plain text file, relational database, other types of database, Excel file, etc. , can all act as a data source. Many different types of data can be a data source: Operations -- such as sales data, HR data, product data, inventory data, marketing data, systems data. Web server logs with user browsing data. Internal market research data. Third-party data, such as census data, demographics data, or survey data. All these data sources together form the Data Source Layer.

Staging Area This is where data sits prior to being scrubbed and transformed into

Staging Area This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration.

ETL Layer This is where data gains its "intelligence", as logic is applied to

ETL Layer This is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an analytical nature. This layer is also where data cleansing happens. The ETL design phase is often the most time-consuming phase in a data warehousing project, and an ETL tool is often used in this layer.

Data Extraction Layer Data gets pulled from the data source into the data warehouse

Data Extraction Layer Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but there is unlikely any major data transformation.

Data Storage Layer This is where the transformed and cleansed data sit. Based on

Data Storage Layer This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the three, two of the three, or all three types.

Data Logic Layer This is where business rules are stored. Business rules stored here

Data Logic Layer This is where business rules are stored. Business rules stored here do not affect the underlying data transformation rules, but do affect what the report looks like.

Data Presentation Layer This refers to the information that reaches the users. This can

Data Presentation Layer This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser, an emailed report that gets automatically generated and sent everyday, or an alert that warns users of exceptions, among others. Usually an OLAP tool and/or a reporting tool is used in this layer.

Metadata Layer This is where information about the data stored in the data warehouse

Metadata Layer This is where information about the data stored in the data warehouse system is stored. A logical data model would be an example of something that's in the metadata layer. A metadata tool is often used to manage metadata.

System Operations Layer This layer includes information on how the data warehouse system operates,

System Operations Layer This layer includes information on how the data warehouse system operates, such as ETL job status, system performance, and user access history.

CHART • An outline map on which specific information, such as scientific data, can

CHART • An outline map on which specific information, such as scientific data, can be plotted is called chart. • A chart, also called a graph, is a graphical representation of data, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart".

CHART

CHART

Tables • In relational databases and flat file databases, a table is a set

Tables • In relational databases and flat file databases, a table is a set of data elements (values) using a model of vertical columns (identifiable by name) and horizontal rows, the cell being the unit where a row and column intersect. A table has a specified number of columns, but can have any number of rows.

Tables

Tables

What is Data Aggregation? • In Data Aggregation, value is derived from the aggregation

What is Data Aggregation? • In Data Aggregation, value is derived from the aggregation of two or more contributing data characteristics. • Aggregation can be made from different data occurrences within the same data subject, business transactions and a de-normalized database and between the real world and detailed data resource design within the common data architecture.

 • Data aggregation helps company data warehouses try to piece together different kinds

• Data aggregation helps company data warehouses try to piece together different kinds of data within the data warehouse so that they can have meaning that will be useful as statistical basis for company reporting and analysis. • But data aggregation, when not implemented well using good algorithm and tools can lead data reporting inaccuracy. Ineffective way of data aggregation is one of the major components that can limit performance of database queries.

Materialized Views • The most common situations where you would find materialized views useful

Materialized Views • The most common situations where you would find materialized views useful are in data warehousing applications and distributed systems. • In warehousing applications, large amounts of data are processed and similar queries are frequently repeated. • If these queries are pre-computed and the results stored in the data warehouse as a materialized view, using materialized views significantly improves performance by providing fast lookups into the set of results.

 • A materialized view definition can include any number of aggregates, as well

• A materialized view definition can include any number of aggregates, as well as any number of joins. In several ways, a materialized view behaves like an index. • The purpose of a materialized view is to increase request execution performance. • The existence of a materialized view is transparent to SQL applications, so a DBA can create or drop materialized views at any time without affecting the validity of SQL applications. • A materialized view consumes storage space. • The contents of the materialized view must be maintained when the underlying detail tables are modified.

Dashboards • In information technology, a dashboard is a user interface that, somewhat resembling

Dashboards • In information technology, a dashboard is a user interface that, somewhat resembling an automobile's dashboard, organizes and presents information in a way that is easy to read. • However, a computer dashboard is more likely to be interactive than an automobile dashboard (unless it is also computer-based). • To some extent, most graphical user interfaces (GUIs) resemble a dashboard. However, some product developers consciously employ this metaphor (and sometimes the term) so that the user instantly recognizes the similarity.

What Is OLAP • OLAP stands for On-Line Analytical Processing. The first attempt to

What Is OLAP • OLAP stands for On-Line Analytical Processing. The first attempt to provide a definition to OLAP was by Dr. Codd, who proposed 12 rules for OLAP. • Online Analytical Processing Server (OLAP) is based on the multidimensional data model. It allows managers, and analysts to get an insight of the information through fast, consistent, and interactive access to information. • Depending on the underlying technology used, OLAP can be broadly divided into two different camps: MOLAP and ROLAP.

MOLAP, ROLAP, And HOLAP

MOLAP, ROLAP, And HOLAP

MOLAP This is the more traditional way of OLAP analysis. In MOLAP, data is

MOLAP This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats.

Advantages: • Excellent performance: MOLAP cubes are built for fast data retrieval, and are

Advantages: • Excellent performance: MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations. • Can perform complex calculations: All calculations have been pre-generated when the cube is created. Disadvantages: • Limited in the amount of data it can handle • Requires additional investment

ROLAP • This methodology relies on manipulating the data stored in the relational database

ROLAP • This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.

Advantages: • Can handle large amounts of data • Can leverage functionalities inherent in

Advantages: • Can handle large amounts of data • Can leverage functionalities inherent in the relational database Disadvantages: • Performance can be slow • Limited by SQL functionalities

HOLAP • HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For

HOLAP • HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type information, HOLAP leverages cube technology for faster performance. When detail information is needed, HOLAP can "drill through" from the cube into the underlying relational data.

QUERY REWRITE • When base tables contain large amount of data, it is an

QUERY REWRITE • When base tables contain large amount of data, it is an expensive and time consuming process to compute the required aggregates or to compute joins between these tables. • In such cases, queries can take minutes or even hours to return the answer. • Because materialized views contain already precomputed aggregates and joins, Oracle employs an extremely powerful process called query rewrite to quickly answer the query using materialized views.

Interactivity in analysis • Drill up • Drill down

Interactivity in analysis • Drill up • Drill down

Data cube example for OLAP operations

Data cube example for OLAP operations

Slice operation The slice operation selects one particular dimension from a given cube and

Slice operation The slice operation selects one particular dimension from a given cube and provides a new sub-cube. Consider the following diagram that shows how slice works.

Dice operation Dice selects two or more dimensions from a given cube and provides

Dice operation Dice selects two or more dimensions from a given cube and provides a new sub-cube. Consider the following diagram that shows the dice operation.

Roll-up operation Roll-up performs aggregation on a data cube in any of the following

Roll-up operation Roll-up performs aggregation on a data cube in any of the following ways: • By climbing up a concept hierarchy for a dimension • By dimension reduction

Drill Down operation Drill-down is the reverse operation of roll-up. It is performed by

Drill Down operation Drill-down is the reverse operation of roll-up. It is performed by either of the following ways: • By stepping down a concept hierarchy for a dimension • By introducing a new dimension.

Data Warehousing - Security • The objective of a data warehouse is to make

Data Warehousing - Security • The objective of a data warehouse is to make large amounts of data easily accessible to the users, hence allowing the users to extract information about the business as a whole. • But we know that there could be some security restrictions applied on the data that can be an obstacle for accessing the information. • If the analyst has a restricted view of data, then it is impossible to capture a complete picture of the trends within the business. • Adding security features affect the performance of the data warehouse, therefore it is important to determine the security requirements as early as possible.

Security Requirements It is difficult to add security features after the data warehouse has

Security Requirements It is difficult to add security features after the data warehouse has gone live. During the design phase of the data warehouse, we should keep in mind what data sources may be added later and what would be the impact of adding those data sources. We should consider the following possibilities during the design phase: • Whether the new data sources will require new security and/or audit restrictions to be implemented? • Whether the new users added who have restricted access to data that is already generally available? • This situation arises when the future users and the data sources are not well known. In such a situation, we need to use the knowledge of business and the objective of data warehouse to know likely requirements.

The following activities get affected by security measures: • User access • Data load

The following activities get affected by security measures: • User access • Data load • Data movement • Query generation

Popular Tools • • • SAP Crystal Reports Micro. Strategy IBM Cognos Actuate Jaspersoft

Popular Tools • • • SAP Crystal Reports Micro. Strategy IBM Cognos Actuate Jaspersoft Pentaho