Data Warehousing Data Mining A review of Data

  • Slides: 19
Download presentation
Data Warehousing & Data Mining A review of Data Warehousing and Data Mining v.

Data Warehousing & Data Mining A review of Data Warehousing and Data Mining v. Aqeel Al-Jishi v. Nick Farley v. Masaki Osada

Data Warehousing Defined: ¡ Data Warehouse – l l ¡ The main repository of

Data Warehousing Defined: ¡ Data Warehouse – l l ¡ The main repository of an organization historical data, it corporate memory It contains the raw material for management’s decision support system Data Mining – (Knowledge Discovery) l The process of analyzing data from different perspectives and summarizing it into useful information

History ¡ ¡ ¡ In the late 1980 s to early 1990 s distinct

History ¡ ¡ ¡ In the late 1980 s to early 1990 s distinct computer databases were created These databases were designed to meet the data analysis needs that Operational Systems were failing to support Operational Systems failed for many reasons: l l Long report generation time Inability to handle loads and not optimized Many organizations had multiple Operation Systems which was a reporting nightmare Custom applications were required for reporting which slowed reporting and increased costs

Goals of Data Warehousing ¡ ¡ ¡ Make information easily accessible Provide endless views

Goals of Data Warehousing ¡ ¡ ¡ Make information easily accessible Provide endless views and combinations of data (Slicing & Dicing) Query results returned with minimal wait time Be adaptive and resilient to change Designed with the correct users in mind (business users and management) Keep information secure but allow access to insiders

Common Issues ¡ ¡ ¡ Problems with Data Acquisition may arise 80% of the

Common Issues ¡ ¡ ¡ Problems with Data Acquisition may arise 80% of the time building a data warehouse will be spent on extracting, cleaning, and loading data Errors with data can be rampant: l l l Incomplete Data (missing fields) Incorrect Data (wrong calculations) Readability Issues (strange formatting)

Differing Design Views, Relational ¡ Relational Model - Bill Inmon l l l The

Differing Design Views, Relational ¡ Relational Model - Bill Inmon l l l The data warehouse is but one part of the Business Intelligence system An enterprise has one data warehouse and data marts source their information from it. Uses 3 rd Normal Form to store information in the database

Differing Design Views, Dimensional ¡ Dimensional model - Ralph Kimball l Data warehouse is

Differing Design Views, Dimensional ¡ Dimensional model - Ralph Kimball l Data warehouse is the conglomerate of all data marts within the enterprise Uses Star or Snowflake schema to emulate a multi-dimensional database Overall information is NOT normalized in the database

Terminology ¡ Data Mart – l l A data mart is a subset of

Terminology ¡ Data Mart – l l A data mart is a subset of data from the data warehouse, typically used when the broad scope of the data warehouse isn’t needed Business departments commonly create, use, and alter their own data marts. ¡ Level of Granularity – l The amount and level of data brought in to the data warehouse during acquisition

Dimensional Model ¡ Dimension Table – l l A table with a single-part primary

Dimensional Model ¡ Dimension Table – l l A table with a single-part primary key and descriptive attribute columns. Describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products

Dimensional Model ¡ Fact Table – l l A table with numeric performance measures

Dimensional Model ¡ Fact Table – l l A table with numeric performance measures (metrics) characterized by a composite key The elements for the composite key come from the foreign keys from a dimensional table

What is Data Mining? ¡ Basic Concept l ¡ Getting “Useful Information” out of

What is Data Mining? ¡ Basic Concept l ¡ Getting “Useful Information” out of a large amount of “Data” In Business World l Getting “Business Intelligence” out of a large amount of “Information” ¡ What is the difference between “Business Intelligence” and “Information”?

Evolutionary Step Business Question Enabling Technologies Characteristics Data Collection (1960 s) “What was my

Evolutionary Step Business Question Enabling Technologies Characteristics Data Collection (1960 s) “What was my total revenue in the last five years? ” Computers, tapes, disks Retrospective, static data delivery Data Access (1980 s) “What were unit sales in New England last March? ” Relational Databases, Structured Query Language Retrospective, dynamic data delivery at record level Data Warehousing & Decision Support (1990 s) “What were unit sales in New England last March? Drill down to Boston. ” On-line analytic processing, multidimensional databases Retrospective, dynamic data delivery at multiple levels Data Mining (Emerging Today) “What’s likely to happen to Boston unit sales next month? Why? ” Advanced Prospective, algorithms, proactive multiprocessor information delivery computers, massive databases

Focus of Data Mining ¡ Predict the future trends and behaviors using the past

Focus of Data Mining ¡ Predict the future trends and behaviors using the past information l Prospective analysis

Uniqueness of Data Mining ¡ Heuristic in nature l Capable of finding hidden patterns

Uniqueness of Data Mining ¡ Heuristic in nature l Capable of finding hidden patterns users would never think of ¡ ¡ Modeling is the key technology to find patterns Self-guiding

An example of Modeling Yesterday Tomorrow Static Information Known Dynamic Information Known Target Build

An example of Modeling Yesterday Tomorrow Static Information Known Dynamic Information Known Target Build a model based on known combinations of static and dynamic information to see if it indicates the result of a dynamic information in the future.

Real world examples of Data Mining “Diaper and Beer” ¡ VISA ¡ l ¡

Real world examples of Data Mining “Diaper and Beer” ¡ VISA ¡ l ¡ Blockbuster Entertainment l ¡ Personalized Recommendation Suppliers to Wal-mart l ¡ Fraud Detection Sales prediction NBA Teams l Strategy analysis

Data Warehouse Data Mart ERP DM ERP SCM DM CRM DM

Data Warehouse Data Mart ERP DM ERP SCM DM CRM DM

Questions?

Questions?

Sources • • • • • • • • BOOKS: Mastering Data Warehouse Design,

Sources • • • • • • • • BOOKS: Mastering Data Warehouse Design, Relational and Dimensional Techniques Claudia Imhoff Nicholas Galemmo Jonathan G Geiger © 2003, Wiley Publishing, Inc. Indianapolis, Indiana The Data Warehouse Toolkit second edition, The Complete Guide to Dimensional Modeling Ralph Kimball Margy Ross © 2002, Wiley and Songs, Inc. New York, NY Statistical Data Mining and Knowledge Discovery Edited by Hamparsum Bozdogan © 2004 CRC Press LLC, Boca Raton, Florida Data Mining and Business Intelligence: A Guide to Productivity Stephan Kudyba and Richard Hoptroff © 2001 Idea Group Publishing, London, England Modern Systems Analysis and Design, fourth edition Jeffrey A. Hoffer, et al © 2005, 2002 by Pearson Education, Inc. Upper Saddle River, New Jersey Database Systems An Application-Oriented Approach Michael Kifer, Arthur Bernstein, Philip Lewis © 2005 Pearson Education, Inc. , New York ELECTRONIC: http: //www. 1 keydata. com/datawarehousing/inmon-kimball. html http: //www. fortunecity. com/skyscraper/oracle/699/orahtml/dbmsmag/9807 d 05. html http: //en. wikipedia. org/wiki/Data_warehouse http: //en. wikipedia. org/wiki/Dimension_(data_warehouse ) http: //en. wikipedia. org/wiki/Data_mining http: //www. anderson. ucla. edu/faculty/jason. frand/teacher/technologies/palace/datamining. htm http: //www. dwinfocenter. org http: //www. thearling. com/text/dmwhite. htm http: //www. learndatamodeling. com/dimension. htm http: //www. learndatamodeling. com/fact. htm