Angelina Njegu Full Professor at Singidunum University Course

  • Slides: 41
Download presentation
Angelina Njeguš | Full Professor at Singidunum University

Angelina Njeguš | Full Professor at Singidunum University

Course Modules Data Warehouse for Business Intelligence 01 | Introduction 05 | Data Warehouse

Course Modules Data Warehouse for Business Intelligence 01 | Introduction 05 | Data Warehouse Feed Project 02 | Data Warehouse Development Lifecycle 06 | Real Time Data Warehouse 03 | Dimensional Modeling 07 | Data Warehouse Appliance 04 | Physical Design of Data Warehouse 08 | Access to Data Labs: Design and implementation of Business Intellligence BI Suite

05 | DW Feed Project

05 | DW Feed Project

Module 5 Overview • Main data and functions of the data feed system •

Module 5 Overview • Main data and functions of the data feed system • Logical loading flows

Data Warehouses: Transformation Flow • From an architectural perspective, you can transform your data

Data Warehouses: Transformation Flow • From an architectural perspective, you can transform your data in the following ways: – Multistage Data Transformation in Data Warehouses – Pipelined Data Transformation in Data Warehouses – Staging Area in Data Warehouses

Multistage Data Transformation in Data Warehouses • The data transformation logic for most data

Multistage Data Transformation in Data Warehouses • The data transformation logic for most data warehouses consists of multiple steps. – For example, in transforming new records to be inserted into a sales table, there may be separate logical transformation steps to validate each dimension key.

Pipelined Data Transformation in Data Warehouses

Pipelined Data Transformation in Data Warehouses

Staging Area in Data Warehouses • It is highly recommended that you stage your

Staging Area in Data Warehouses • It is highly recommended that you stage your raw data across as many physical disks as possible to ensure the reading of the raw data is not a bottleneck during the load. • For example, to stage the data the Oracle Database File System (DBFS) can be used. DBFS is similar to NFS in that it provides a shared network file system that looks like a local file system and has both a server component and a client component. • NFS (Network File System) is a distributed file system protocol originally developed by Sun Microsystems in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed.

06 | Real Time Data Warehouse

06 | Real Time Data Warehouse

Module 6 Overview • General aspects of the real time data warehouse • Quick

Module 6 Overview • General aspects of the real time data warehouse • Quick description of the main DWH real-time architectures

Architectures for Loading Data Warehouses

Architectures for Loading Data Warehouses

Era of Big data Analytics system

Era of Big data Analytics system

Data Lake vs. Data Warehouse • Data warehouses and data lakes are two different

Data Lake vs. Data Warehouse • Data warehouses and data lakes are two different types of data storage repositories. • Data warehouses have a long history as an enterprise technology used to store structured data, cleaned up and organized for specific business purposes, and serve it to reporting or BI tools. • Data lake is a newer technology, made popular by Hadoop and its open source ecosystem. • A data lake enables storing both structured and unstructured data in its original form, and processing later when analysis is needed.

Data warehousing in the cloud • There are many robust data warehouse tools offered

Data warehousing in the cloud • There are many robust data warehouse tools offered today on cloudbased infrastructure, including: – Amazon Redshift - a fully-managed, analytical data warehouse that can handle petabyte-scale data, and enable querying it in seconds. – Google Big. Query - an enterprise-grade cloud-native data warehouse, which runs fast interactive and ad-hoc queries on datasets of petabyte-scale. – Panoply - the world’s first smart data warehouse, which is cloud-based, scalable and performant, and also able to automatically transform data to analytics in minutes. – Azure Synapse Analytics - a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources—at scale.

Data lakes in the cloud • There also a number of options for running

Data lakes in the cloud • There also a number of options for running data lakes in the cloud, including: – Amazon S 3 - an object storage platform built to store and retrieve any amount of data from any data source, and designed for 99. 99% durability. – Azure Blob Storage - stores billions of objects in hot, cool, or archive tiers, depending on how often data is accessed. • Data ranges from structured (converted to object form) to any unstructured format - images, videos, audio, documents.

Data lake architecture overview

Data lake architecture overview

What is data ingestion? • Data ingestion is the process of obtaining and importing

What is data ingestion? • Data ingestion is the process of obtaining and importing data for immediate use or storage in a database. • To ingest something is to "take something in or absorb something. " • Data can be streamed in real time or ingested in batches. • When data is ingested in real time, each data item is imported as it is emitted by the source. • When data is ingested in batches, data items are imported in discrete chunks at periodic intervals of time. • An effective data ingestion process begins by prioritizing data sources, validating individual files and routing data items to the correct destination.

Data Ingestion Framework

Data Ingestion Framework

Modern Data Architecture Reference Architecture

Modern Data Architecture Reference Architecture

New real time big data technologies

New real time big data technologies

07 | Data Warehouse Appliance

07 | Data Warehouse Appliance

Module 7 Overview • General aspects of the DWH Appliance • Highlighting aspects of

Module 7 Overview • General aspects of the DWH Appliance • Highlighting aspects of a DWH Appliance

Enterprise Information Integration (EII)

Enterprise Information Integration (EII)

Service oriented information integration

Service oriented information integration

SOI realization • Enterprise Service Bus (ESB) offers mediation services that do: – Protocol

SOI realization • Enterprise Service Bus (ESB) offers mediation services that do: – Protocol conversion - Transparently translate between communication protocols (e. g. , HTTP, FTP, REST, SOAP, JSON, DCOM, CORBA, SAP RFC etc. ) – Mapping - Transfer between tabular data formats – Translation and transformation - Change data content based on rules – Queuing and buffering - Handle differing data processing speeds between sender and receiver – Event handling - Guarantee event processing

About ESB

About ESB

ESB enables SOI

ESB enables SOI

Before ESB Legacy Application Enterprise resource planning Cobol Application ERP J 2 EE New

Before ESB Legacy Application Enterprise resource planning Cobol Application ERP J 2 EE New Application Call Center Application CRM Application Java Application . Net Application

After ESB Legacy Application Enterprise resource planning New Application Cobol Application ERP J 2

After ESB Legacy Application Enterprise resource planning New Application Cobol Application ERP J 2 EE New Application Enterprise Service Bus Routing Messaging Security Connectivity Transformation Monitoring Management Call Center Application CRM Application Java Application . Net Application

ESB example

ESB example

ESB as a proxy

ESB as a proxy

ESB Main tasks – Routs messages – Converts protocols – Transforms data formats –

ESB Main tasks – Routs messages – Converts protocols – Transforms data formats – Handles business events

Mediation services • Protocol switch (e. g. XML/HTTP -> RMI/IIOP, SOAP/JMS -> IIOP) •

Mediation services • Protocol switch (e. g. XML/HTTP -> RMI/IIOP, SOAP/JMS -> IIOP) • Transform (e. g. XML -> COBOL, Object -> XML) • Enrich (e. g. add information from external sources, query database) • Route • Distribute • Monitor • Correlate

Mediation services

Mediation services

Data Warehouse appliance by Teradata

Data Warehouse appliance by Teradata

Data Warehouse appliance • Based on two-socket server nodes • using eight-core Xeon E

Data Warehouse appliance • Based on two-socket server nodes • using eight-core Xeon E 5 -2670 • running at 2. 6 GHz • SUSE Linux Enterprise Server 10 SP 3.

Platform family

Platform family

Teradata Aster Big Analytics • Embeds Map. Reduce processing for deeper insights on new

Teradata Aster Big Analytics • Embeds Map. Reduce processing for deeper insights on new data sources and multi-structured data types to deliver analytic capabilities with breakthrough performance and scalability. • The platform utilizes Aster's patented SQL-Map. Reduce® to parallelize the processing of data and applications and deliver rich analytic insights through the simplicity of structured query language (SQL) and business intelligence tools.

Lab Exercises Exercise 7 – Cubing Services and Summary Tables (75 min) – 10

Lab Exercises Exercise 7 – Cubing Services and Summary Tables (75 min) – 10 p. Exercise 8 – Cognos and ISW Packs (60 min) – for IBM academic certificate