Data Warehouse 2 1 Drill across process of

  • Slides: 31
Download presentation
Data Warehouse 구축 (2) 1

Data Warehouse 구축 (2) 1

Drill across • process of linking tow or more fact tables at the same

Drill across • process of linking tow or more fact tables at the same granularity – dimension table공유 2

Star Schema : Time dimension • 시간차원의 중요성 – 시간차원이 없는 질의는 meaningless! •

Star Schema : Time dimension • 시간차원의 중요성 – 시간차원이 없는 질의는 meaningless! • What are the sales volumes for all products available in store 52? • “for what period of time” – Time의 속성 • day of week, week of month, work day, weekend, holiday, season, fiscal period 3

Star Schema : Space dimension • Space차원이 있는 질의 – Marketing 정책 수립에 유용한

Star Schema : Space dimension • Space차원이 있는 질의 – Marketing 정책 수립에 유용한 분석 보고서 작성 – “What is the average driving distance for all customers who made a purchase at store 52 during the Christmas holiday season? ” – “What is the average income and family size of the neighborhoods where my customers exists? ” • effectively expand analysis domain • Space 차원의 속성 – 위/경도, street address, street block, city quadrant, zip code, street, city, country, state, – demographic information • household income, education, family size, home value, etc • customer address ~ store(school)간 driving time 4

Star Schema : Space dimension • Spatial industry – petroleum, telecommunications, government agencies •

Star Schema : Space dimension • Spatial industry – petroleum, telecommunications, government agencies • 질의 예 – “What is all those customers who bought from stores 1 & 2 during the Christmas holiday season? ” • (result) All customers who drove 3 minutes or less to a store in Tampa during holiday season – “See the market for all stores in the surrounding area presented in concentric rings of 1, 2, & 3 miles” – “See total potential customers who drive time to each of your store in the Tampa area is 4, 8, or 12 minutes 5

Spatial dimension이 추가된 Star Schema Time Dimension customer_id customer_code customer_name address 1 address 2

Spatial dimension이 추가된 Star Schema Time Dimension customer_id customer_code customer_name address 1 address 2 city state zip telephone life_stamp * customer_geo_code date_id customer_id product_id store_id customer_geo_id item_price item_qty item_cost customer_geo_id customer_geo_code zip_code census_track block_group country state avg_income avg_house_size avg_age. . . product_id product_code description unit_price unit_cost selling_unit stocking_unit life_stamp. . . 7

Star Schema : Space dimension • Target Mailing 에 응용 예 – for tuning

Star Schema : Space dimension • Target Mailing 에 응용 예 – for tuning AD. expenditure • targeting upper-middle class, middle-aged males • “What was our total sales revenue from customers who bought from store 52 during Christmas season and who live in neighborhood earning an average of $50, 000 per year and are more than 30 years old? ” 8

Star Schema : Space dimension • For the success of spatial data in DW

Star Schema : Space dimension • For the success of spatial data in DW – the effectiveness of the database extensions for spatial data types – providing specific applications that make it easy to include spatial data as a natural part of user application – Spatial data의 활용 9

Star Schema : Extensibility • 확장성 – add new dimension at any time, as

Star Schema : Extensibility • 확장성 – add new dimension at any time, as long as you are careful to preserve the original grain – 기존 프로그램 그대로 운용 가능 • Causal dimension – advisory dimension that should not change the fundamental grain of a table • 어떤 event가 일어나게 된 이유를 설명 • promotion, store condition – “Was my promotion profitable? ” 10

Star Schema : Extensibility • Building the fact table at a granular level –

Star Schema : Extensibility • Building the fact table at a granular level – 만약, 주단위로 요약된 fact table의 경우, monthly data로 확장할 수는 없다. 11

New causal dimension Causal_key Condition_name Price_treatment_type Price_discount Ad_type Ad_media_name Ad_size Display_type Display_provider Display_size 12

New causal dimension Causal_key Condition_name Price_treatment_type Price_discount Ad_type Ad_media_name Ad_size Display_type Display_provider Display_size 12

1. Adding new unanticipated facts • as long as they are consistent with the

1. Adding new unanticipated facts • as long as they are consistent with the fundamental grain of the existing fact table 2. Adding completely new dimensions (ex. causal dimension) • as long as there is a single value of that dimension defined for each existing fact record 3. Adding new, unanticipated dimensional attributes 4. Breaking existing dimension records down to a lower level of granularity from a certain point in time forward. (minidimension)13

Star Schema : Helper • In Health. Care billing – grain : individual line

Star Schema : Helper • In Health. Care billing – grain : individual line item on a doctor bill – 한 환자가 여러 개의 진단명을 가질 수 있음. – choose one value (“primary” diagnosis) and omit the other values • Diagnosis data의 유용성이 떨어짐 – create a fixed number of additional Diagnosis dimension slots in the fact table key • Helper Table – weighting factor : 한 grain을 형성하는 각 요소의 중요도 할당 • 예) 3개의 진단명이 있는 경우, 각 weighting factor는 1/3 14

Star Schema : Helper 15

Star Schema : Helper 15

Star Schema : Factless Fact Tables • 단지, dimension table과 link되는 multi-part key만을 가지고

Star Schema : Factless Fact Tables • 단지, dimension table과 link되는 multi-part key만을 가지고 있음 • 유형 I : table that record event (event-tracking table) – 예) fact table for recording student attendance on a daily basis at a college • • • grain : individual student attendance event “Which classes were the most heavily attended? ” “Which classes were the most consistently attended? ” “Which teachers taught the most students? ” “Which teachers taught classes in facilities belonging to other departments? ” “What was the average total walking distance of a student in a given day? ” 16

Star Schema : Factless Fact Tables 17

Star Schema : Factless Fact Tables 17

Star Schema : Factless Fact Tables • 유형 II : coverage table – 일어나지

Star Schema : Factless Fact Tables • 유형 II : coverage table – 일어나지 않은 사건에 대한 질의를 수행 • “Which products were on promotion that didn’t sell? ” – fact table의 내용이 sparse할 때 사용 – 예) coverage table for each product in each store that is on promotion in each time period (9609) 18

Star Schema : Factless Fact Tables 19

Star Schema : Factless Fact Tables 19

Star Schema : Slowly Changing Dimension • Example – 품목 설명이 종종 바뀌는 ‘품목’

Star Schema : Slowly Changing Dimension • Example – 품목 설명이 종종 바뀌는 ‘품목’ 차원 테이블 • 처리 방법 – Overwriting • 변경된 dimension 속성의 과거 이력을 보존하지 못함. • 그래서, 이전 value가 중요하지 않은 경우에 사용 – Creating Another Dimension Record – Creating Current Value Field 20

Star Schema : Slowly Changing Dimension • Creating Another Dimension Record – generalize the

Star Schema : Slowly Changing Dimension • Creating Another Dimension Record – generalize the key to changed dimension • 예) primary key + version digit • product dimension의 경우, SKU#+01, SKU#+02 – fact table의 레코드들을 dimension attribute의 history에 따라 partition이 가능 – 과거 이력을 수직적으로 보존 – 단점 • dimension key의 generalization • dimension table의 크기(레코드 수) 증가 • Creating a Current Value Field – “current value” 필드를 추가 – 과거 이력을 수평적으로 보존 21

Star Schema : Slowly Changing Dimension *예) Pkg_type변동 glued box => pasted box SKU#-01

Star Schema : Slowly Changing Dimension *예) Pkg_type변동 glued box => pasted box SKU#-01 => SKU#-02 22

Star Schema : Big Dimension • “Big” dimension – millions or tens of millions

Star Schema : Big Dimension • “Big” dimension – millions or tens of millions records • 예) customer dimension – 개개인(고객)의 정보를 기록 – 압축, 요약이 힘들고, 다른 차원 테이블에 비해 변동이 심함. • Mini. Dimension의 생성 – 어느 정도 static한 dimension과 계속적으로 변하는 dimension을 분리 – 예) customer dimension의 경우에 demographic minidimension • 계속 변화하는 demographic measure (income, purchase_behavior)가 일정 간 격을 가지는 구획값으로 변형 • demographic dimension attribute의 모든 가능한 값의 조합을 결정하여 키값 을 생성 23

Demographic Dimension Sales Fact time_key demographics_key customer_key product_key promotion_key sales_units sales_dollars demographics_key age_level income_level

Demographic Dimension Sales Fact time_key demographics_key customer_key product_key promotion_key sales_units sales_dollars demographics_key age_level income_level marital_status sex purchase_behavior Customer Dimension customer_key first_name last_name street_address city state zip demographics_key 24

Star Schema : Mini. Dimension • Mini. Dimension – minimize the joins necessary thought

Star Schema : Mini. Dimension • Mini. Dimension – minimize the joins necessary thought the fact table when you want to retrieve spatial data for customers without fact aggregation Big Dimension 25

Star Schema : Mini. Dimension • Advantages – support frequent snapshotting of customer profiles

Star Schema : Mini. Dimension • Advantages – support frequent snapshotting of customer profiles with no increase in data storage or data complexity as you increase the number of snapshots – demographic dimension itself cannot be allowed to grow too large • Disadvantages – browsing performance 저하 • demographics data can only be browsed along with the more constant customer data by liinking through the fact table – if fact table is empty, (아무런 event가 발생하지 않음) • cannot link the demographics to the customer • dummy sales event 삽입 26

Star Schema : Voyage • 여행관련 분석질의 – How many trips are there in

Star Schema : Voyage • 여행관련 분석질의 – How many trips are there in a typical overall tickets? – Between which trip city pairs do we have the most customer satisfaction problem? – cf) telephone cable 27

Star Schema 예 : Voyage 28

Star Schema 예 : Voyage 28

Star Schema 예 : Insurance • Transaction, Claim Processing 29

Star Schema 예 : Insurance • Transaction, Claim Processing 29

Star Schema : 인적자원 관리 30

Star Schema : 인적자원 관리 30

Star Schema : 인적자원 관리 • 분석질의 – report summary statuses of the entire

Star Schema : 인적자원 관리 • 분석질의 – report summary statuses of the entire employee base on a regular (monthly) basis • # of employees, total salary paid during the month • cumulative salary paid this year • cumulative vacation days taken, vacation days accrued, number of new hires, # of promotions – profile the employee population at any precise instant in time • how many employees we have • what their detailed profiles were on that date – monitoring every action taken on a given employee • transaction sequence/time 31