1 Data Warehouse DSS EIS DSS Data Warehouse

  • Slides: 109
Download presentation
1

1

¢ Data Warehouse 의 출현 배경 ¤ DSS, EIS ¤ 재래시스템과 DSS ¢ Data

¢ Data Warehouse 의 출현 배경 ¤ DSS, EIS ¤ 재래시스템과 DSS ¢ Data Warehouse의 정의 및 특징 ¤ Data Warehouse의 정의, 특정 ¤ Data Warehouse의 구조 ¤ Data Warehouse의 구축효과 ¤ Data Warehouse환경내 정보흐름 ¢ Data Warehouse Architecture ¢ Data Warehouse의 주요 개념 2

DSS EIS 변환/추출 기존 응용 시스템 DW Reports Data Processing (Transaction) Information Processing OLTP성

DSS EIS 변환/추출 기존 응용 시스템 DW Reports Data Processing (Transaction) Information Processing OLTP성 DB와 OLAP성 DB(DW)의 분리 17

19

19

¢ 일반 Data Model (OLTP) Ship Type District Credit Shipper Order Item Ship To

¢ 일반 Data Model (OLTP) Ship Type District Credit Shipper Order Item Ship To Product Contact Loc Product Line Cust Loc Sales Order Product Group Contact Contrct Sales Rep Contrct Type Sales District Customer Sales Region Sales Division 20

¢ Dimensional Model (Star schema) Time Dimension time_key day_of_week month quarter year holiday_flag Sales

¢ Dimensional Model (Star schema) Time Dimension time_key day_of_week month quarter year holiday_flag Sales Fact time_ key product_key store_key dollars_sold units_solds dollars_cost Product Dimension product_key description brand category Store Dimension store_key store_name address floor_plan_type Q: 지난 상반기 중 공휴일에 가장 잘 팔린 상표는? 22

24

24

¤주제 중심 (subject oriented) ¤통합성 (integrated) ¤시간성 (역사성; time-variant, historical) ¤갱신되지 않음 (nonvolatile) 25

¤주제 중심 (subject oriented) ¤통합성 (integrated) ¤시간성 (역사성; time-variant, historical) ¤갱신되지 않음 (nonvolatile) 25

Data Warehouse의 구조 highly summarized M e t a D a t a OLTP

Data Warehouse의 구조 highly summarized M e t a D a t a OLTP 로부터 load 10년간의 품목군별 월별 판매량 lightly summarized 5년간의 품목별 주간 판매량 Current Detail 올해 판매 구체적 데이타 (current detail) 지난해까지의 판매 구체적 데이타 (older detail) 30

Up-Flow In-Flow DW Out-Flow Meta-Flow Down-Flow 36

Up-Flow In-Flow DW Out-Flow Meta-Flow Down-Flow 36

¢ In-flow email capture repair validate transform OLTP 시스템 Current Detail 외부자료 - 들여올

¢ In-flow email capture repair validate transform OLTP 시스템 Current Detail 외부자료 - 들여올 데이타 결정 DW - 무결성 (consistency/integrity) - 수정 보완 - 변환과 적용 37

highly summarized ¢ Up-flow lightly summarized Current Detail - 오래된 정보는 요약하여 위로 보냄

highly summarized ¢ Up-flow lightly summarized Current Detail - 오래된 정보는 요약하여 위로 보냄 - 자주 쓰이는 정보는 미리 정제 - 사용자 가까이로 분배 (departmental processing) 38

정보화 (Informating) - Better information for analytic use - Integrate data from different systems

정보화 (Informating) - Better information for analytic use - Integrate data from different systems - Better access to historical data 자동화 (Automating) DW 정보기술 투자액 • Data Warehouse의 목적 - Better access to current data OLTP 1960 데이터 수집 1970 데이터 접근 1980 1990 데이터 네비게이션 2000 데이터 마이닝 42

¢ Multi-dimensional model / Star schema (계 속) Ÿ 이러한 현상을 schema에 그대로 반영

¢ Multi-dimensional model / Star schema (계 속) Ÿ 이러한 현상을 schema에 그대로 반영 product key description size. . . market key description region. . . Fact table : product key market key time key sales amount sales number. . . time key day of wk month quarter year holiday flag Dimension tables: 49

¢ 데이터 추출, 변환 ¢ Operational Data Store(ODS) ¢ Data Mart ¢ Meta. Data

¢ 데이터 추출, 변환 ¢ Operational Data Store(ODS) ¢ Data Mart ¢ Meta. Data ¢ Multidimensional Database(MDB) 57

58

58

Extraction & Transforming Cleansing Merging & Supplementing OLTP Mapping ETT 구성 요소. Load controller

Extraction & Transforming Cleansing Merging & Supplementing OLTP Mapping ETT 구성 요소. Load controller ( 스케쥴링). Validator ( 정제). Converter (코드 변환). Logger (결과 저장). Meta. Data Manager. Aggregator. Exception Handler Transportation Maintain Data Warehouse 61

¢ 유형 ¤ type checking ¤ 정수, 실수, 날짜 타입 cf. ) 컴파일러의 type

¢ 유형 ¤ type checking ¤ 정수, 실수, 날짜 타입 cf. ) 컴파일러의 type checking ¤ range checking ¤ 예) age 필드, 판매가 필드 Time Dimension key ¤ based on business rule ¤ dimension key checking 1988 김 1, 000 1989 이 2, 500 1991 박 3, 000 1992 심 2, 200 1993 김 2, 300 19993 김 2, 500 1996 서 3, 300 1991 1992 1993 1994 1995 1996 * * * * * 1991 박 3, 000 1992 심 2, 200 1993 김 2, 300 * * * 1996 서 3, 300 64

ETT tool (Micro. Soft) 66

ETT tool (Micro. Soft) 66

¢ data warehouse의 한계 ¤DSS환경에만 운용가능, operational 환경에는 부적합 ¢ Operational Data Store ¤DW를

¢ data warehouse의 한계 ¤DSS환경에만 운용가능, operational 환경에는 부적합 ¢ Operational Data Store ¤DW를 operational system 환경으로 확장 ¤Day-to-day operational arena ¤not tightly-coupled 70

A DSS C ODS DW EIS PC B 71

A DSS C ODS DW EIS PC B 71

¢ Subject-oriented, integrated, volatile ¢ used for operational information processing ¢ current and near-current

¢ Subject-oriented, integrated, volatile ¢ used for operational information processing ¢ current and near-current collection of data ¢ Transformation ¤ Tapping delta data ¤ collapsing ¤ large data -> small data ¤ selective sample, selective subset ¤ Moving ¤ move one component at a time 72

ODS - current, near current - detailed data - updates - generally small -

ODS - current, near current - detailed data - updates - generally small - homogeneous data - full-function - update-record-oriented - clerical day-to-day decision making - up-to the second decision DW - historical data - summary and detail - nonvolatile snapshots - large - heterogeneous data - load-and-access tech. - DSS analyst or management oriented community - long-term analysis & trend analysis 73

¢ 장점 ¤customize the data as the data flows into the data mart from

¢ 장점 ¤customize the data as the data flows into the data mart from the data warehouse. ¤The amount of historical data that is needed is a function of the department, not the corporation ¤resource utilization ¤The department can select software for their data mart that is very elegant and is tailored to fit their needs. 77

¢ 정의 ¤Computer software system designed to allow for the efficient and convenient storage

¢ 정의 ¤Computer software system designed to allow for the efficient and convenient storage and retrieval of large volumes of data that is ¤ 1) intimately related ¤ 2) stored, viewed analyzed from different perspectives ¢ 목적 ¤flexible, high performance access and analysis of large volumes of data 85

¢ Different perspectives ¤Sales volumes by model ¤Sales volumes by color ¤Sales volumes by

¢ Different perspectives ¤Sales volumes by model ¤Sales volumes by color ¤Sales volumes by dealership ¤Sales volumes over time ¢ What is the trend in sales volumes over a period of time for a specific model and color across a specific group of dealer ship? 86

모델 MINIBAN SPORT CAR SEDAN 색깔 BLUE RED WHITE 양 5 7 9 4

모델 MINIBAN SPORT CAR SEDAN 색깔 BLUE RED WHITE 양 5 7 9 4 7 2 1 0 9 87

모 델 MINI VAN 7 5 9 SPORTS CAR 4 7 2 1 0

모 델 MINI VAN 7 5 9 SPORTS CAR 4 7 2 1 0 9 RED WHITE Sedan BLUE 색깔 88

89

89

¢ 10 x 10 arrays vs. 1000 records table ¤ 30 positions search vs.

¢ 10 x 10 arrays vs. 1000 records table ¤ 30 positions search vs. 1000 records scanning ¤ In average, 15 vs. 500 searches 90

91

91

¢ Ease of Data Presentation and Navigation ¤SQL의 한계, User의 관점 ¢ Ease of

¢ Ease of Data Presentation and Navigation ¤SQL의 한계, User의 관점 ¢ Ease of Maintenance ¤No additional overhead is required to transalate user queries into requests of data ¢ Performance ¤benchmark에서 증명됨 92

94

94

¢ Viewpoint의 변화 ¤ in RDB ¤require complex query or sort operation ¤ in

¢ Viewpoint의 변화 ¤ in RDB ¤require complex query or sort operation ¤ in MDB ¤just rotate without rearrangement of data ¢ Data Slicing 95

96

96

¢ 3 Dimension ¤ 6개의 view #1. #2. #3. #4. #5. #6. Model by

¢ 3 Dimension ¤ 6개의 view #1. #2. #3. #4. #5. #6. Model by Color (with Dealership Color by Model (with Dealership Color by Dealership (with Model Dealership by Color (with Model by Dealership (with Color Dealership by Model (with Color in in in the the the background) background) 97

98

98

¢ Select the desired positions along each dimesnion ¤ For the model dimension SPORTS

¢ Select the desired positions along each dimesnion ¤ For the model dimension SPORTS COUPE and MINI VAN ¤ For the dealership dimension CARR and CLYDE ¤ For the color dimension METAL BLUE and NORMAL BLUE ¢ Data Dicing 99

100

100

¢ Different views of data ¤ Sales by Model and Dealership ¤ Sales by

¢ Different views of data ¤ Sales by Model and Dealership ¤ Sales by Model and District ¢ Separate, independent dimension? ¤ Dealership dimension and District dimension ¢ Define hierarchy within the same dimension ¤ Organization Dimension ¤ multiples level within a hierarchy ¤ Dealership, District, Region 101

102

102

¢ Drill-down ¤ Moving down ¤ More detailed analysis along the different levels ¢

¢ Drill-down ¤ Moving down ¤ More detailed analysis along the different levels ¢ Roll-up ¤ Moving up Vice President | Senior Manager | Sales Team | Sales Person Nation | Region | District | Dealership Product Family | Product Line | Product Personnel Organization Products Year | Quarter | Month | Week | Day Time 103

104

104

Organization Region Import Point District Distribution Point Dealership 105

Organization Region Import Point District Distribution Point Dealership 105

¢ PRINT TOTAL. (SALES_VOLUME KEEP MODEL DEALERSHIP) 106

¢ PRINT TOTAL. (SALES_VOLUME KEEP MODEL DEALERSHIP) 106

107

107

109

109