CRM Data Warehouse Data Mart OLAP Data Mining
- Slides: 85
목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례
목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례
Definition of DW Subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions. ( by W. H. Inmon) One or more tools to extract fields from any kind of data structure(flat, hierarchical, relational, or object; open or proprietary), including external data. The synthesis of the data into a nonvolatile, integrated, subject -oriented database with a metadata “catalog” DW is a process not a product DW has a no size limitations
Attributes of Data Warehouse It is a database designed for analytical tasks, using data from multiple applications It supports a relatively small number of users with relatively long interactions Its usage is read-intensive Its content is periodically updated(mostly additions) It contains current and historical data to provide historical perspective of information It contains a few large tables Each query frequently results in a large result set and involves frequent full table scan and multi-table joins
Related terms to the DW Current detail data Old detail data Data mart Summarized data Drill-down, Drill-up Metadata
Metadata about data describe the data warehouse used for building, maintaining, and using the data warehouse can be classified technical metadata business metadata warehouse operational information
Synonyms for Data Warehouse Management Information System(MIS) Executive Information System(EIS) Decision Support System(DSS) Data Mart A Data Mart is a small Data Warehouse(Departmental DW)
Feature of Data Warehouse
Cost Structure of a Data Warehouse. Project
Data Warehouse Architecture Information Delivery System Operational & External Data Management Platform Metadata MRDG Data Extract Data Cleanup Data Load Data Warehouse DBMS MDDB Report, Query, EIS Tools OLAP Tools Data Marts Data Mining Tools Admin Platform Repository Applications & Tools
Data Warehouse Database Central data warehouse database is almost always implemented on the RDBMS technology. traditional RDBMS implementations are optimized for transactional database processing Very large database size, ad hoc query processing, need for flexible user view creation(aggregates, multi-table joins, and drilldowns) have become drivers for different approaches. Different technological approaches Parallel relational database design An innovative approach to speed up a traditional RDBMS by using new index structures to bypass relational table scans Multidimensional Database(MDDBs)
Sourcing, Acquisition, Cleanup and Transformation Tools (1) perform all of the conversions summarizations, key changes structural changes condensation produce programs and control statements including COBOL programs, MVS job control language, UNIS scripts, and SQL data definition language maintain metadata
Sourcing, Acquisition, Cleanup and Transformation Tools (2) functionality removing unwanted data from operational databases converting to common data names and definitions calculating summaries and derived data establishing default for missing data accommodating source data definition changes Some significant issues database heterogeneity data heterogeneity
Sourcing, Acquisition, Cleanup and Transformation Tools (3) merits save a considerable amount of time and effort demerits generally useful for simpler data extracts customized extract routines need to be developed for more complicated data-extraction procedures Venders prominent in this arena Ardent/Prism Solutions Evolutionary Technologies Inc. (ETI) Vality Informatica Praxis Carleton
A Data Warehouse Project is a consulting Project
Architectural Debate rages Bill Inmon There is a only one way to build a data mart. Build your central corporate data warehouse first, and then string your data marts off of it. Doug Hackney The incremental data mart approach to building an enterprise data warehouse is fast becoming the only reliable way to get it done fast and affordably
Top down Centralized Data Warehouse Right architecture Centralized control Enterprise view Consistent Metadata High data integrity Wrong strategy Lengthy implementation Too expensive High failure rate
Marts Bottom up: Independent Data Marts Wrong architecture Islands of data No enterprise view Incomplete Metadata Difficult to manage Right strategy Fast implementation Cost effective Immediate ROI Repeatable process
목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례
Enterprise Data Mart Architecture Right Architecture Centralized control Enterprise view Consistent Metadata High data integrity Right Strategy Fast implementation Cost effective Immediate ROI Repeatable process
Customer Problems Addressed during Extraction and Transformation
What is Data Mart Centric? http: //www-db. stanford. edu/dbseminar/Archive/Fall. Y 97/slides/ncr/
Allure of Data Mart Quicker to Implement Easier to Manage Cheaper to Build High Query Performance
Quicker to Implement The Promise “Load and Go” Known Reports Business Unit Focused The Reality Removes Cross Functional Capability Provides no new insight Performs little, If any, data transformation Doesn’t enforce business integrity
Easier to Manage The Promise Smaller Data Volumes Smaller Workloads Known Environment SMP versus MPP Management issues The Reality Actually harder as more data marts are added No standards lead to increase confusion Data redundancy as marts are added Manage on a node by node basis
Cheaper to Build The Promise Smaller platforms and DASD Less DBA resources necessary The Reality HW/SW less than 20% of a solution Real cost is administration and application More DBA’s as marts are added In order to reduce implementation cost, many critical steps are ignored
Higher Query Performance The Promise “Sub-second” response Answering and questions users want to ask Drill Down and Drill Across The Reality Usually requires Star Schema, which limits growth Only answers “known” questions No exploratory capability beyond planned queries Fast answers are net necessarily better answers Response time is from thought to action
Why Data Mart Fail : Technical In order to build, you need to know the questions-and the answers Little to no Data Transformation Dirty Data is the biggest challenge How do you know how dirty your data is? Usually rely on tools to hide the problems, until it’s too late Architecture does not support long term goals Limit risk by Ignoring the Future
Why Data marts Fail : Business Data Marts treat warehousing as a technical problem rather than a business solution Diverts resources to solving “point” solutions, not the foundation for information ROI is too low to justify expenditure How much return do you need for a $1 Million dollar expenditure? In how long?
Top 10 Complaints on Data Marts Performance Too many data marts Users want more access Hard to find skilled personnel Reconciling inconsistencies Tools too difficult Must customize tools Incompatible tools Expectations too high Demand doubles in first year
목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례
What is OLAP? (1) Data Warehouse stores tactical information that answers "who? " and "what? " questions about past events. A typical query submitted to a Data Warehouse is: "What was the total revenue for the eastern region in the third quarter? " Distinction between Data Warehouse and OLAP Data Warehouse is usually based on relational technology OLAP uses a multidimensional view of aggregate data to quick access to strategic information for further analysis. provide
What is OLAP? (2) OLAP enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information. OLAP transforms raw data so that it reflects the real dimensionality of the enterprise as understood by the user. OLAP systems can answer "who? " and "what? " questions "what if? " and "why? " that sets them apart from DW decision-making about future actions. A typical OLAP calculation : "What would be the effect on soft drink costs to distributors if syrup prices went up by $. 10/gallon and transportation costs went down by $. 05/mile? "
What is OLAP? (3) OLAP and Data Warehouses are complementary. A Data Warehouse stores and manages data. OLAP transforms Data Warehouse data into strategic information. OLAP ranges from basic navigation and browsing (often known as "slice and dice"), to calculations, to more serious analyses such as time series and complex modeling. As decision-makers exercise more advanced OLAP capabilities, they move from data access to information to knowledge.
What is OLAP? (4 -1) Fast Analysis of Shared Multidimensional Information FAST means that the system is targeted to deliver most responses to users within about five seconds, with the simplest analyses taking no more than one second and very few taking more than 20 seconds. ANALYSIS means that the system can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user. SHARED means that the system implements all the security requirements for confidentiality (possibly down to cell level) and, if multiple write access is needed, concurrent update locking at an appropriate level.
What is OLAP? (4 -2) MULTIDIMENSIONAL is key requirement. The system must provide a multidimensional conceptual view of the data, including full support for hierarchies and multiple hierarchies. INFORMATION is all of the data and derived information needed, wherever it is and however much is relevant for the application.
Who Uses OLAP? OLAP applications span a variety of organizational functions. Finance departments use OLAP for applications such as budgeting, activity-based costing (allocations), financial performance analysis, and financial modeling. Sales analysis and forecasting are two of the OLAP applications found in sales departments. marketing departments use OLAP for market research analysis, sales forecasting, promotions analysis, customer analysis, and market/customer segmentation. Typical manufacturing OLAP applications : production planning and defect analysis.
Why Uses OLAP? The ability to provide managers with the information they need to make effective decisions about an organization's strategic directions. The ability to provide "just-in-time" information for effective decision-making. This requires more than a base level of detailed data. Just-in-time information is computed data that usually reflects complex relationships and is often calculated on the fly. Analyzing and modeling complex relationships are practical only if response times are consistently short.
OLAP Tools(1) can be classified as MOLAP(multidimensional) ROLAP(relational) HOLAP(hybrid) Some more popular OLAP tools Essbase from Arbor/Hyperion Oracle Express Cognos Power. Play Microstrategy Dss Server Microsoft Decision Support Service Prodea from Platinum Technologies Meta. Cube from Informix Brio Technologies
OLAP Tools(2) Amulet Consulting for the database software development industry Applix providing i. TM 1 - Real time enterprise planning, analysis and reporting for e-Businesses. Broadbase Information Systems designs, develops and markets the next generation data mart solution. Codework, providing HELM stand-alone OLAP for the Windows platforms Dimensional Insight, Inc. , a leading developer of multidimensional data visualization, analysis, and reporting software. Federal Data Corporation, is a leading systems integrator for the federal government sector. Gentia Software, offering Business Intelligence and a new data mining product. Nasdaq: GNTI Hyperion, providing Essbase OLAP Server and other enterprise OLAP solutions Information Advantage, offering relational OLAP and Web decision-support tools. Inter. Soft Lab, developers of Contour, a Desktop OLAP system (Windows). INsight FORMATION, Inc. , Minneapolis, MN based consulting company providing solutions in Business Intelligence, data mining, OLAP, DSS and data warehousing. Knosys provides Pro. Clarity data visualization and component-based OLAP solutions based on Microsoft SQL Server(tm) 7. 0. Microstrategy, provides intelligent e-business platform and OLAP tools
OLAP Tools(3) MIS AG, a German company, developer of ALEA multidimensional database and Delta Miner tool designed for analysing data in OLAP format. Oracle, a leader in data warehousing products and services. Platinum Technology, Inc. , provider of Data Warehousing Solutions Query. Object Systems, a data mart software company providing business intelligence solutions for BIG data problems using fractal mathematics. SAS Institute, provider of a suite of tools for data warehousing and data mining. Secor Consulting Limited, OLAP/DSS solutions provider based in the UK Stat. Soft, the developer of the STATISTICA line of products for data mining, analysis, and visualization; provides consulting, training, and data warehousing services. Stone, Timber, River provides Matryx. Access and Matryx 98 OLAP applications for Access and Excel. Tektonic Software, providing Info. Charger Engine for enabling OLAP or Data Mining tools to work on large volumes of data. Trans. Quest Technologies, providing technical employment services to computer professionals that specialize in Data Warehousing. White. Light, providing data warehouse design, generation and meta data management - with tools for the generation of relational OLAP data marts.
목 차 CRM 분석 기법 및 솔루션 Data Warehouse Data Mart OLAP Data Mining : 기법, 솔루션 사례
Data Mining Applications (1) 카드의 도용사고 방지(Fraud Detection) 위험 관리(Risk Management) 고객 불만 관리(Claim Prevention) 고객 유지(Churn Management, Customer Retention) 고객 유치(Customer Acquisition) 고객 세분화 및 프로파일링(Customer Segmentation & Profiling) 수요 및 판매 예측(Forecasting), 가격 산출(Pricing) 각종 마케팅 효과 관리 Target Marketing, Tele Marketing, Direct Mailing 교차 판매(Cross Selling/Up Selling) etc.
Data Mining 응용 구조의 예 HTTP 서버 Client HTML 파일 Java Program Miner 서버 Miner Client Tier 1 Mining Engine DB Connecter Domain Knowledge Tier 2 Database Server Tier 3
Data Mining Process Domain Knowledge Data Collection Data Preparation Feature Extraction Selection of Mining Model Adaptation NN Decision Tree GA Browse r Train Validation Visualization Explanation Test Reject Accept
Data Mining Techniques (2) Decision Tree Instance를 root에서 leaf까지 내려오면서 Sorting. C 4. 5 CART(Classification and Regression Trees) Sunny Outlook Humidity High Rain Wind Normal Strong Weak CHAID(Chi-Square Interaction Detection Analysis) Case Based Reasoning Case Base 기존의 사례 데이터 베이스를 이용하여 새로운 사례를 예측 K-NN을 이용하여 유사한 사례 추출 New Case 추출된 유사 사례들에 근거하여 새로운 사례에 대한 출력 산출 Solution
Data Mining Techniques (3) Discriminant Analysis 통계적인 분류 기법 Regression 통계적인 추정 기법 Association Link Analysis(Graph Theory에 기반) K-Means, FCM : Clustering 방법 etc.
Data Mining Techniques (4)
Data Mining Products (1) Mining Task 에 따른 분류 : 각 툴에서 특징적인 데이터 마이닝 기능에 따른 분류 w Classification w Clustering w Estimation w Link Analysis Classification Multi Task w Visualization w Statistics w other Estimation Statistics Link Analysis Visualization
Data Mining Products (2) Multi-task Tools : 다양한 기능 구비 pd : MLC++, MOBAL, TOOLDIAG rp : DBMiner, Emerald, Kepler, Weka 2. 2 com: Clementine, Data. Engine 2. 1, Data. Mind Data Cruncher, Datasage, Decision. Centre, IDIS Data Mining Suite, Darwin, Delta Miner, Hyperparallel//Discovery, IBM Intelligent Miner, INSPECT, Neo Vista, Nuggets, ORCHESTRATE, Partek, Pilot Discovery Server, Polyanalyst 3. 0, PRW and Model 1 family, SAS Data Mining Software, SGI Mine. Set v 2. 0, SPSS, SRA KDD Toolset
Data Mining Products (3) Classification: Multiple approaches: pd: MLC++, SIPINA-W 2. 0 , rp: [*new*] JAM com: Clementine, Decision. House, Model. Quest, Gain, Xpertrule Analyser Decision-tree approach: pd: LMDT, OC 1, PC 4. 5, SE-Learn com: AC 2, Alice 4. 3, Business Miner, C 4. 5, C 5. 0, CART, Cognos Scenario, [*new*] Decisionhouse, IND v 2. 0, KATE-tools, Knowledge. SEEKER, Preclass, SPSS CHAID, Xpertrule Profiler Rule Discovery approach: pd: Brute , CN 2 , FOIL , MLC++ , rp: DBMiner, RIPPER com: [*new*] Datamite, Data Surveyor, Super. Query, [*new*], WINROSA, Wiz. Why Neural network approach: pd: NN FAQ free software, NEuro. Net site
Data Mining Products (4) com: NN FAQ commercial software, 4 Thought, Brain. Maker, INSPECT, MATLAB NN Toolbox, Model. Quest, [*update*] Neural. Works Predict, Neural. Works Professional II/PLUS, Proforma, PRW, SPSS Neural Connection 2 Rough Set approach: pd : Rough Enough rp : Rosetta, Grobian com: [*update*] Datalogic, K-DYS Genetic Programming approach: com: OMEGA Nearest Neighbour approach: pd: MLC++ rp: PEBLS Clustering: pd : Autoclass C, ECOBWEB, Snob com: Autoclass III, COBWEB/3,
Data Mining Products (5) Estimation: com: Cubist Link Analysis: A catalog of Software for Belief Networks rp : Bayesian Knowledge Discoverer, [*new*] Belief Network Constructor, Claudien, FDEP, Microsoft MSBN, com: AT-Sigma Data Chopper, BMR, Hugin, Strategist, TETRADII Visualization for Discovery : pd : Graf-FX rp : IRIS, Vis. DB, [*new*] Xmdv com: Daisy, Sphinx, Spotfire, NETMAP, VDI Discovery for Developers, Visual. Mine, Win. Viz Statistical and Scientific Visualization: pd : MLC++ com: Cross. Graphs, Data Desk, DX: IBM Visualization Data Explorer, IDL, Mathematica, PV-Wave, PVE, SPSS Diamond, STATlab, Summarization: rp: Claudien, DBMiner, Emerald
SAS Data Mining Solution(1) SEMMA(Sample, Explor Modify, Model, Assess) process Sampling random sampling, nth-observation sampling, stratified sampling, first-n sampling, cluster sampling of an input data set. Exploration and Modification provide several ways Graphical Displays(multidimensional bar charts, simple graphs) Outlier Filtering Transformations(log, square-root, inverse, square, exponential) Model Regression - for linear and logistic regression Neural Networks - for nonlinear or linear modeling Tree based Method
SAS Data Mining Solution(2) Regression w logit w probit w complementary log-log w identity Neural Network w Generalize linear model(GLIN) w Multilayer perceptron(MLP) w Radial basis function(RBF) w Equal-width RBF w Normalized equal-width RBF Tree based Method w Statistical decision tree w criterion for evaluating a splitting rule statistical significance test(F-test, Chi-square test) w reduction in variance, entropy, or gini impurity measure w
SAS Data Mining Solution(4) Process Flow Diagram A graph of the parameters of a regression model estimated by the Regression
SAS Data Mining Solution(5) Neural network Diagram Tree Diagram from the Data. Splits Trees Browser window
SGI MINESET V 2. 1 Analytic Data Mining Tools Decision Tree Classifiers Evidence Classifiers Association Rules Intuitive Visual Data Mining Tools 3 -dimensional, animated, interactive visualizations for geographical, multi-dimensional, and hierarchical data. 3 -dimensional, animated, interactive visualizations for decision tree, naive-Bayes, and association rule representation and analysis. Splat Visualizer and Scatter Visualizer Map Visualizer Tree Visualizer Record Viewer URL: http: //www. sgi. com/Products/software/Mine. Set www. sgi. com Com: Silicon Graphics
SGI MINESET V 2. 1 Decision Tree Classifiers
SGI MINESET V 2. 1 Association Rules and Rule Visualizer
SGI MINESET V 2. 1 Map Visualizer : The Geographical Point of View
SGI MINESET V 2. 1 Tree Visualizer : Flying through Hierarchical Structure
SGI MINESET V 2. 1 Stat Visualizer : Statistical Reporting
Case Studies (1) 이탈 고객 탐지(Deviation Detection) 3 Data Mining Process Model Selection • Multi Layer Perceptron with BP Model Assessment 이 탈 률 (%) 60 • 10 …. 이탈점수가 높은 전체 고객의 10% 중에 실제 이탈자는 60%
- Olap data warehouse
- Mining fraud
- Contoh data warehouse dan data mart
- Data mart adalah
- Data warehouse vs data mart
- Data warehouse and olap technology
- Olap architecture diagram
- 3 tier architecture of data warehouse
- Data warehouse and olap technology
- Olap vs oltp in data mining
- Data mining in data warehouse
- Perbedaan data warehouse dan data mining
- Data mining dan data warehouse
- Data warehouse dan data mining
- Mining complex data types
- Building data mining applications for crm
- Multimedia data mining
- "zoho crm" crm
- Strip mining vs open pit mining
- Mineral resources and mining chapter 13
- Difference between strip mining and open pit mining
- Text and web mining
- An overview of data warehousing and olap technology
- An overview of data warehousing and olap technology
- Sap business objects rapid marts
- Characteristics of data mart
- Mxplorer
- Data mart prepaid
- Fia data mart
- Ccc data mart
- Federated data mart
- Contoh data mart
- Data mart azure
- O que é data mart
- Aqs data mart
- Aqs data mart
- Data warehouse components
- Introduction to data warehousing
- What is data acquisition in data warehouse
- Data reduction in data mining
- What is missing data in data mining
- Concept hierarchy generation for nominal data
- Data reduction in data mining
- Data reduction in data mining
- Shell cube in data mining
- Data reduction in data mining
- Complex data types in data mining
- Noisy data in data mining
- Data preparation for data mining
- Data compression in data mining
- Introduction to data warehousing and data mining
- Complex data types in data mining
- Oltp vs olap
- Olap business objects
- Olap pentaho
- Perbedaan oltp dan olap
- Apa itu olap
- Distributed olap
- Characteristics of olap
- Microsoft azure olap
- Snowflake oltp or olap
- Dss warehouse
- Olap operations example
- Is olap dead
- Business objects olap
- Olap security
- Dw olap
- Olap kocka
- What does olap stand for
- Dss data warehouse
- Sas olap cube studio
- Olap in mis
- Goal of olap
- Olap applications
- Yevri zulfiqar
- Olap
- Veri ambarı uzmanı
- Olap
- Olap x oltp
- Olap facts and dimensions
- Oltp and olap in sql
- Olap
- Starnet query model in data warehouse
- O que é olap
- Oltp cube
- Budget vs forecast