CRM Data Warehouse Data Mart OLAP Data Mining

  • Slides: 85
Download presentation

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례

Definition of DW Subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management

Definition of DW Subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions. ( by W. H. Inmon) One or more tools to extract fields from any kind of data structure(flat, hierarchical, relational, or object; open or proprietary), including external data. The synthesis of the data into a nonvolatile, integrated, subject -oriented database with a metadata “catalog” DW is a process not a product DW has a no size limitations

Attributes of Data Warehouse It is a database designed for analytical tasks, using data

Attributes of Data Warehouse It is a database designed for analytical tasks, using data from multiple applications It supports a relatively small number of users with relatively long interactions Its usage is read-intensive Its content is periodically updated(mostly additions) It contains current and historical data to provide historical perspective of information It contains a few large tables Each query frequently results in a large result set and involves frequent full table scan and multi-table joins

Related terms to the DW Current detail data Old detail data Data mart Summarized

Related terms to the DW Current detail data Old detail data Data mart Summarized data Drill-down, Drill-up Metadata

Metadata about data describe the data warehouse used for building, maintaining, and using the

Metadata about data describe the data warehouse used for building, maintaining, and using the data warehouse can be classified technical metadata business metadata warehouse operational information

Synonyms for Data Warehouse Management Information System(MIS) Executive Information System(EIS) Decision Support System(DSS) Data

Synonyms for Data Warehouse Management Information System(MIS) Executive Information System(EIS) Decision Support System(DSS) Data Mart A Data Mart is a small Data Warehouse(Departmental DW)

Feature of Data Warehouse

Feature of Data Warehouse

Cost Structure of a Data Warehouse. Project

Cost Structure of a Data Warehouse. Project

Data Warehouse Architecture Information Delivery System Operational & External Data Management Platform Metadata MRDG

Data Warehouse Architecture Information Delivery System Operational & External Data Management Platform Metadata MRDG Data Extract Data Cleanup Data Load Data Warehouse DBMS MDDB Report, Query, EIS Tools OLAP Tools Data Marts Data Mining Tools Admin Platform Repository Applications & Tools

Data Warehouse Database Central data warehouse database is almost always implemented on the RDBMS

Data Warehouse Database Central data warehouse database is almost always implemented on the RDBMS technology. traditional RDBMS implementations are optimized for transactional database processing Very large database size, ad hoc query processing, need for flexible user view creation(aggregates, multi-table joins, and drilldowns) have become drivers for different approaches. Different technological approaches Parallel relational database design An innovative approach to speed up a traditional RDBMS by using new index structures to bypass relational table scans Multidimensional Database(MDDBs)

Sourcing, Acquisition, Cleanup and Transformation Tools (1) perform all of the conversions summarizations, key

Sourcing, Acquisition, Cleanup and Transformation Tools (1) perform all of the conversions summarizations, key changes structural changes condensation produce programs and control statements including COBOL programs, MVS job control language, UNIS scripts, and SQL data definition language maintain metadata

Sourcing, Acquisition, Cleanup and Transformation Tools (2) functionality removing unwanted data from operational databases

Sourcing, Acquisition, Cleanup and Transformation Tools (2) functionality removing unwanted data from operational databases converting to common data names and definitions calculating summaries and derived data establishing default for missing data accommodating source data definition changes Some significant issues database heterogeneity data heterogeneity

Sourcing, Acquisition, Cleanup and Transformation Tools (3) merits save a considerable amount of time

Sourcing, Acquisition, Cleanup and Transformation Tools (3) merits save a considerable amount of time and effort demerits generally useful for simpler data extracts customized extract routines need to be developed for more complicated data-extraction procedures Venders prominent in this arena Ardent/Prism Solutions Evolutionary Technologies Inc. (ETI) Vality Informatica Praxis Carleton

A Data Warehouse Project is a consulting Project

A Data Warehouse Project is a consulting Project

Architectural Debate rages Bill Inmon There is a only one way to build a

Architectural Debate rages Bill Inmon There is a only one way to build a data mart. Build your central corporate data warehouse first, and then string your data marts off of it. Doug Hackney The incremental data mart approach to building an enterprise data warehouse is fast becoming the only reliable way to get it done fast and affordably

Top down Centralized Data Warehouse Right architecture Centralized control Enterprise view Consistent Metadata High

Top down Centralized Data Warehouse Right architecture Centralized control Enterprise view Consistent Metadata High data integrity Wrong strategy Lengthy implementation Too expensive High failure rate

Marts Bottom up: Independent Data Marts Wrong architecture Islands of data No enterprise view

Marts Bottom up: Independent Data Marts Wrong architecture Islands of data No enterprise view Incomplete Metadata Difficult to manage Right strategy Fast implementation Cost effective Immediate ROI Repeatable process

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례

Enterprise Data Mart Architecture Right Architecture Centralized control Enterprise view Consistent Metadata High data

Enterprise Data Mart Architecture Right Architecture Centralized control Enterprise view Consistent Metadata High data integrity Right Strategy Fast implementation Cost effective Immediate ROI Repeatable process

Customer Problems Addressed during Extraction and Transformation

Customer Problems Addressed during Extraction and Transformation

What is Data Mart Centric? http: //www-db. stanford. edu/dbseminar/Archive/Fall. Y 97/slides/ncr/

What is Data Mart Centric? http: //www-db. stanford. edu/dbseminar/Archive/Fall. Y 97/slides/ncr/

Allure of Data Mart Quicker to Implement Easier to Manage Cheaper to Build High

Allure of Data Mart Quicker to Implement Easier to Manage Cheaper to Build High Query Performance

Quicker to Implement The Promise “Load and Go” Known Reports Business Unit Focused The

Quicker to Implement The Promise “Load and Go” Known Reports Business Unit Focused The Reality Removes Cross Functional Capability Provides no new insight Performs little, If any, data transformation Doesn’t enforce business integrity

Easier to Manage The Promise Smaller Data Volumes Smaller Workloads Known Environment SMP versus

Easier to Manage The Promise Smaller Data Volumes Smaller Workloads Known Environment SMP versus MPP Management issues The Reality Actually harder as more data marts are added No standards lead to increase confusion Data redundancy as marts are added Manage on a node by node basis

Cheaper to Build The Promise Smaller platforms and DASD Less DBA resources necessary The

Cheaper to Build The Promise Smaller platforms and DASD Less DBA resources necessary The Reality HW/SW less than 20% of a solution Real cost is administration and application More DBA’s as marts are added In order to reduce implementation cost, many critical steps are ignored

Higher Query Performance The Promise “Sub-second” response Answering and questions users want to ask

Higher Query Performance The Promise “Sub-second” response Answering and questions users want to ask Drill Down and Drill Across The Reality Usually requires Star Schema, which limits growth Only answers “known” questions No exploratory capability beyond planned queries Fast answers are net necessarily better answers Response time is from thought to action

Why Data Mart Fail : Technical In order to build, you need to know

Why Data Mart Fail : Technical In order to build, you need to know the questions-and the answers Little to no Data Transformation Dirty Data is the biggest challenge How do you know how dirty your data is? Usually rely on tools to hide the problems, until it’s too late Architecture does not support long term goals Limit risk by Ignoring the Future

Why Data marts Fail : Business Data Marts treat warehousing as a technical problem

Why Data marts Fail : Business Data Marts treat warehousing as a technical problem rather than a business solution Diverts resources to solving “point” solutions, not the foundation for information ROI is too low to justify expenditure How much return do you need for a $1 Million dollar expenditure? In how long?

Top 10 Complaints on Data Marts Performance Too many data marts Users want more

Top 10 Complaints on Data Marts Performance Too many data marts Users want more access Hard to find skilled personnel Reconciling inconsistencies Tools too difficult Must customize tools Incompatible tools Expectations too high Demand doubles in first year

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례

What is OLAP? (1) Data Warehouse stores tactical information that answers "who? " and

What is OLAP? (1) Data Warehouse stores tactical information that answers "who? " and "what? " questions about past events. A typical query submitted to a Data Warehouse is: "What was the total revenue for the eastern region in the third quarter? " Distinction between Data Warehouse and OLAP Data Warehouse is usually based on relational technology OLAP uses a multidimensional view of aggregate data to quick access to strategic information for further analysis. provide

What is OLAP? (2) OLAP enables analysts, managers, and executives to gain insight into

What is OLAP? (2) OLAP enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information. OLAP transforms raw data so that it reflects the real dimensionality of the enterprise as understood by the user. OLAP systems can answer "who? " and "what? " questions "what if? " and "why? " that sets them apart from DW decision-making about future actions. A typical OLAP calculation : "What would be the effect on soft drink costs to distributors if syrup prices went up by $. 10/gallon and transportation costs went down by $. 05/mile? "

What is OLAP? (3) OLAP and Data Warehouses are complementary. A Data Warehouse stores

What is OLAP? (3) OLAP and Data Warehouses are complementary. A Data Warehouse stores and manages data. OLAP transforms Data Warehouse data into strategic information. OLAP ranges from basic navigation and browsing (often known as "slice and dice"), to calculations, to more serious analyses such as time series and complex modeling. As decision-makers exercise more advanced OLAP capabilities, they move from data access to information to knowledge.

What is OLAP? (4 -1) Fast Analysis of Shared Multidimensional Information FAST means that

What is OLAP? (4 -1) Fast Analysis of Shared Multidimensional Information FAST means that the system is targeted to deliver most responses to users within about five seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds. ANALYSIS means that the system can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user. SHARED means that the system implements all the security requirements for confidentiality (possibly down to cell level) and, if multiple write access is needed, concurrent update locking at an appropriate level.

What is OLAP? (4 -2) MULTIDIMENSIONAL is key requirement. The system must provide a

What is OLAP? (4 -2) MULTIDIMENSIONAL is key requirement. The system must provide a multidimensional conceptual view of the data, including full support for hierarchies and multiple hierarchies. INFORMATION is all of the data and derived information needed, wherever it is and however much is relevant for the application.

Who Uses OLAP? OLAP applications span a variety of organizational functions. Finance departments use

Who Uses OLAP? OLAP applications span a variety of organizational functions. Finance departments use OLAP for applications such as budgeting, activity-based costing (allocations), financial performance analysis, and financial modeling. Sales analysis and forecasting are two of the OLAP applications found in sales departments. marketing departments use OLAP for market research analysis, sales forecasting, promotions analysis, customer analysis, and market/customer segmentation. Typical manufacturing OLAP applications : production planning and defect analysis.

Why Uses OLAP? The ability to provide managers with the information they need to

Why Uses OLAP? The ability to provide managers with the information they need to make effective decisions about an organization's strategic directions. The ability to provide "just-in-time" information for effective decision-making. This requires more than a base level of detailed data. Just-in-time information is computed data that usually reflects complex relationships and is often calculated on the fly. Analyzing and modeling complex relationships are practical only if response times are consistently short.

OLAP Tools(1) can be classified as MOLAP(multidimensional) ROLAP(relational) HOLAP(hybrid) Some more popular OLAP tools

OLAP Tools(1) can be classified as MOLAP(multidimensional) ROLAP(relational) HOLAP(hybrid) Some more popular OLAP tools Essbase from Arbor/Hyperion Oracle Express Cognos Power. Play Microstrategy Dss Server Microsoft Decision Support Service Prodea from Platinum Technologies Meta. Cube from Informix Brio Technologies

OLAP Tools(2) Amulet Consulting for the database software development industry Applix providing i. TM

OLAP Tools(2) Amulet Consulting for the database software development industry Applix providing i. TM 1 - Real time enterprise planning, analysis and reporting for e-Businesses. Broadbase Information Systems designs, develops and markets the next generation data mart solution. Codework, providing HELM stand-alone OLAP for the Windows platforms Dimensional Insight, Inc. , a leading developer of multidimensional data visualization, analysis, and reporting software. Federal Data Corporation, is a leading systems integrator for the federal government sector. Gentia Software, offering Business Intelligence and a new data mining product. Nasdaq: GNTI Hyperion, providing Essbase OLAP Server and other enterprise OLAP solutions Information Advantage, offering relational OLAP and Web decision-support tools. Inter. Soft Lab, developers of Contour, a Desktop OLAP system (Windows). INsight FORMATION, Inc. , Minneapolis, MN based consulting company providing solutions in Business Intelligence, data mining, OLAP, DSS and data warehousing. Knosys provides Pro. Clarity data visualization and component-based OLAP solutions based on Microsoft SQL Server(tm) 7. 0. Microstrategy, provides intelligent e-business platform and OLAP tools

OLAP Tools(3) MIS AG, a German company, developer of ALEA multidimensional database and Delta

OLAP Tools(3) MIS AG, a German company, developer of ALEA multidimensional database and Delta Miner tool designed for analysing data in OLAP format. Oracle, a leader in data warehousing products and services. Platinum Technology, Inc. , provider of Data Warehousing Solutions Query. Object Systems, a data mart software company providing business intelligence solutions for BIG data problems using fractal mathematics. SAS Institute, provider of a suite of tools for data warehousing and data mining. Secor Consulting Limited, OLAP/DSS solutions provider based in the UK Stat. Soft, the developer of the STATISTICA line of products for data mining, analysis, and visualization; provides consulting, training, and data warehousing services. Stone, Timber, River provides Matryx. Access and Matryx 98 OLAP applications for Access and Excel. Tektonic Software, providing Info. Charger Engine for enabling OLAP or Data Mining tools to work on large volumes of data. Trans. Quest Technologies, providing technical employment services to computer professionals that specialize in Data Warehousing. White. Light, providing data warehouse design, generation and meta data management - with tools for the generation of relational OLAP data marts.

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining

목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례

Data Mining Applications (1) 카드의 도용사고 방지(Fraud Detection) 위험 관리(Risk Management) 고객 불만 관리(Claim

Data Mining Applications (1) 카드의 도용사고 방지(Fraud Detection) 위험 관리(Risk Management) 고객 불만 관리(Claim Prevention) 고객 유지(Churn Management, Customer Retention) 고객 유치(Customer Acquisition) 고객 세분화 및 프로파일링(Customer Segmentation & Profiling) 수요 및 판매 예측(Forecasting), 가격 산출(Pricing) 각종 마케팅 효과 관리 Target Marketing, Tele Marketing, Direct Mailing 교차 판매(Cross Selling/Up Selling) etc.

Data Mining 응용 구조의 예 HTTP 서버 Client HTML 파일 Java Program Miner 서버

Data Mining 응용 구조의 예 HTTP 서버 Client HTML 파일 Java Program Miner 서버 Miner Client Tier 1 Mining Engine DB Connecter Domain Knowledge Tier 2 Database Server Tier 3

Data Mining Process Domain Knowledge Data Collection Data Preparation Feature Extraction Selection of Mining

Data Mining Process Domain Knowledge Data Collection Data Preparation Feature Extraction Selection of Mining Model Adaptation NN Decision Tree GA Browse r Train Validation Visualization Explanation Test Reject Accept

Data Mining Techniques (2) Decision Tree Instance를 root에서 leaf까지 내려오면서 Sorting. C 4. 5

Data Mining Techniques (2) Decision Tree Instance를 root에서 leaf까지 내려오면서 Sorting. C 4. 5 CART(Classification and Regression Trees) Sunny Outlook Humidity High Rain Wind Normal Strong Weak CHAID(Chi-Square Interaction Detection Analysis) Case Based Reasoning Case Base 기존의 사례 데이터 베이스를 이용하여 새로운 사례를 예측 K-NN을 이용하여 유사한 사례 추출 New Case 추출된 유사 사례들에 근거하여 새로운 사례에 대한 출력 산출 Solution

Data Mining Techniques (3) Discriminant Analysis 통계적인 분류 기법 Regression 통계적인 추정 기법 Association

Data Mining Techniques (3) Discriminant Analysis 통계적인 분류 기법 Regression 통계적인 추정 기법 Association Link Analysis(Graph Theory에 기반) K-Means, FCM : Clustering 방법 etc.

Data Mining Techniques (4)

Data Mining Techniques (4)

Data Mining Products (1) Mining Task 에 따른 분류 : 각 툴에서 특징적인 데이터

Data Mining Products (1) Mining Task 에 따른 분류 : 각 툴에서 특징적인 데이터 마이닝 기능에 따른 분류 w Classification w Clustering w Estimation w Link Analysis Classification Multi Task w Visualization w Statistics w other Estimation Statistics Link Analysis Visualization

Data Mining Products (2) Multi-task Tools : 다양한 기능 구비 pd : MLC++, MOBAL,

Data Mining Products (2) Multi-task Tools : 다양한 기능 구비 pd : MLC++, MOBAL, TOOLDIAG rp : DBMiner, Emerald, Kepler, Weka 2. 2 com: Clementine, Data. Engine 2. 1, Data. Mind Data Cruncher, Datasage, Decision. Centre, IDIS Data Mining Suite, Darwin, Delta Miner, Hyperparallel//Discovery, IBM Intelligent Miner, INSPECT, Neo Vista, Nuggets, ORCHESTRATE, Partek, Pilot Discovery Server, Polyanalyst 3. 0, PRW and Model 1 family, SAS Data Mining Software, SGI Mine. Set v 2. 0, SPSS, SRA KDD Toolset

Data Mining Products (3) Classification: Multiple approaches: pd: MLC++, SIPINA-W 2. 0 , rp:

Data Mining Products (3) Classification: Multiple approaches: pd: MLC++, SIPINA-W 2. 0 , rp: [*new*] JAM com: Clementine, Decision. House, Model. Quest, Gain, Xpertrule Analyser Decision-tree approach: pd: LMDT, OC 1, PC 4. 5, SE-Learn com: AC 2, Alice 4. 3, Business Miner, C 4. 5, C 5. 0, CART, Cognos Scenario, [*new*] Decisionhouse, IND v 2. 0, KATE-tools, Knowledge. SEEKER, Preclass, SPSS CHAID, Xpertrule Profiler Rule Discovery approach: pd: Brute , CN 2 , FOIL , MLC++ , rp: DBMiner, RIPPER com: [*new*] Datamite, Data Surveyor, Super. Query, [*new*], WINROSA, Wiz. Why Neural network approach: pd: NN FAQ free software, NEuro. Net site

Data Mining Products (4) com: NN FAQ commercial software, 4 Thought, Brain. Maker, INSPECT,

Data Mining Products (4) com: NN FAQ commercial software, 4 Thought, Brain. Maker, INSPECT, MATLAB NN Toolbox, Model. Quest, [*update*] Neural. Works Predict, Neural. Works Professional II/PLUS, Proforma, PRW, SPSS Neural Connection 2 Rough Set approach: pd : Rough Enough rp : Rosetta, Grobian com: [*update*] Datalogic, K-DYS Genetic Programming approach: com: OMEGA Nearest Neighbour approach: pd: MLC++ rp: PEBLS Clustering: pd : Autoclass C, ECOBWEB, Snob com: Autoclass III, COBWEB/3,

Data Mining Products (5) Estimation: com: Cubist Link Analysis: A catalog of Software for

Data Mining Products (5) Estimation: com: Cubist Link Analysis: A catalog of Software for Belief Networks rp : Bayesian Knowledge Discoverer, [*new*] Belief Network Constructor, Claudien, FDEP, Microsoft MSBN, com: AT-Sigma Data Chopper, BMR, Hugin, Strategist, TETRADII Visualization for Discovery : pd : Graf-FX rp : IRIS, Vis. DB, [*new*] Xmdv com: Daisy, Sphinx, Spotfire, NETMAP, VDI Discovery for Developers, Visual. Mine, Win. Viz Statistical and Scientific Visualization: pd : MLC++ com: Cross. Graphs, Data Desk, DX: IBM Visualization Data Explorer, IDL, Mathematica, PV-Wave, PVE, SPSS Diamond, STATlab, Summarization: rp: Claudien, DBMiner, Emerald

SAS Data Mining Solution(1) SEMMA(Sample, Explor Modify, Model, Assess) process Sampling random sampling, nth-observation

SAS Data Mining Solution(1) SEMMA(Sample, Explor Modify, Model, Assess) process Sampling random sampling, nth-observation sampling, stratified sampling, first-n sampling, cluster sampling of an input data set. Exploration and Modification provide several ways Graphical Displays(multidimensional bar charts, simple graphs) Outlier Filtering Transformations(log, square-root, inverse, square, exponential) Model Regression - for linear and logistic regression Neural Networks - for nonlinear or linear modeling Tree based Method

SAS Data Mining Solution(2) Regression w logit w probit w complementary log-log w identity

SAS Data Mining Solution(2) Regression w logit w probit w complementary log-log w identity Neural Network w Generalize linear model(GLIN) w Multilayer perceptron(MLP) w Radial basis function(RBF) w Equal-width RBF w Normalized equal-width RBF Tree based Method w Statistical decision tree w criterion for evaluating a splitting rule statistical significance test(F-test, Chi-square test) w reduction in variance, entropy, or gini impurity measure w

SAS Data Mining Solution(4) Process Flow Diagram A graph of the parameters of a

SAS Data Mining Solution(4) Process Flow Diagram A graph of the parameters of a regression model estimated by the Regression

SAS Data Mining Solution(5) Neural network Diagram Tree Diagram from the Data. Splits Trees

SAS Data Mining Solution(5) Neural network Diagram Tree Diagram from the Data. Splits Trees Browser window

SGI MINESET V 2. 1 Analytic Data Mining Tools Decision Tree Classifiers Evidence Classifiers

SGI MINESET V 2. 1 Analytic Data Mining Tools Decision Tree Classifiers Evidence Classifiers Association Rules Intuitive Visual Data Mining Tools 3 -dimensional, animated, interactive visualizations for geographical, multi-dimensional, and hierarchical data. 3 -dimensional, animated, interactive visualizations for decision tree, naive-Bayes, and association rule representation and analysis. Splat Visualizer and Scatter Visualizer Map Visualizer Tree Visualizer Record Viewer URL: http: //www. sgi. com/Products/software/Mine. Set www. sgi. com Com: Silicon Graphics

SGI MINESET V 2. 1 Decision Tree Classifiers

SGI MINESET V 2. 1 Decision Tree Classifiers

SGI MINESET V 2. 1 Association Rules and Rule Visualizer

SGI MINESET V 2. 1 Association Rules and Rule Visualizer

SGI MINESET V 2. 1 Map Visualizer : The Geographical Point of View

SGI MINESET V 2. 1 Map Visualizer : The Geographical Point of View

SGI MINESET V 2. 1 Tree Visualizer : Flying through Hierarchical Structure

SGI MINESET V 2. 1 Tree Visualizer : Flying through Hierarchical Structure

SGI MINESET V 2. 1 Stat Visualizer : Statistical Reporting

SGI MINESET V 2. 1 Stat Visualizer : Statistical Reporting

Case Studies (1) 이탈 고객 탐지(Deviation Detection) 3 Data Mining Process Model Selection •

Case Studies (1) 이탈 고객 탐지(Deviation Detection) 3 Data Mining Process Model Selection • Multi Layer Perceptron with BP Model Assessment 이 탈 률 (%) 60 • 10 …. 이탈점수가 높은 전체 고객의 10% 중에 실제 이탈자는 60%