Prediction Cubes BeeChung Chen Lei Chen Yi Lin
Prediction Cubes Bee-Chung Chen, Lei Chen, Yi Lin and Raghu Ramakrishnan University of Wisconsin - Madison
Subset Mining n We want to find interesting subsets of the dataset ¡ ¡ n Interestingness: Defined by the “model” built on a subset Cube space: A combination of dimension attribute values defines a candidate subset (just like regular OLAP) We want the measures to represent decision/prediction behavior ¡ ¡ Summarize a subset using the “model” built on it Big change from regular OLAP! 2
The Idea n Build OLAP data cubes in which cell values represent decision/prediction behavior ¡ ¡ ¡ In effect, build a tree for each cell/region in the cube—observe that this is not the same as a collection of trees used in an ensemble method! The idea is simple, but it leads to promising data mining tools Ultimate objective: Exploratory analysis of the entire space of “data mining choices” n Choice of algorithms, data conditioning parameters … 3
Example (1/7): Regular OLAP Z: Dimensions Y: Measure Goal: Look for patterns of unusually high numbers of applications: Location Locatio n Time # of App. … … AL, USA Dec, 04 … … . . . 2 … WY, USA Dec, 04 3 Time 4
Example (2/7): Regular OLAP Goal: Look for patterns of unusually high numbers of applications: Coarser regions CA US A … 04 03 … 10 0 80 … 90 90 … … 2004 2003 … Jan … Dec … 20 2 … 50 8 … 25 10 … 30 … … … Locatio n Time # of App. … … AL, USA Dec, 04 … … . . . 2 … WY, USA Dec, 04 3 2004 Jan … Dec Roll up CA 30 USA 70 … … Z: Dimensions Y: Measure Drill down … … … Cell value: Number of loan applications CA USA … AB … YT AL … WY … 20 5 5 55 5 10 … 15 2 3 … … Finer regions … … 15 20 15 … … … … … 5
Example (3/7): Decision Analysis Goal: Analyze a bank’s loan decision process w. r. t. two dimensions: Location and Time Fact table D Z: Dimensions X: Predictors Y: Class Location Cube subset Time Race Sex … Approval AL, USA … Dec, 04 White … … M … … … Yes … WY, USA Dec, 04 Black F … No Location Model h(X, Z(D)) E. g. , decision tree Time 6
Example (3/7): Decision Analysis n Are there branches (and time windows) where approvals were closely tied to sensitive attributes (e. g. , race)? ¡ n Suppose you partitioned the training data by location and time, chose the partition for a given branch and time window, and built a classifier. You could then ask, “Are the predictions of this classifier closely correlated with race? ” Are there branches and times with decision making reminiscent of 1950 s Alabama? ¡ Requires comparison of classifiers trained using different subsets of data. 7
Example (4/7): Prediction Cubes 2004 2003 … Ja n … De c … CA 0. 4 0. 8 0. 9 0. 6 0. 8 … … US A 0. 2 0. 3 0. 5 … … Data …[USA, …Dec… (D) Rac Sex Locatio 04]Time … … n AL , USA … Dec, 04 … e Whit e … WY, USA Dec, 04 Blac k 1. Build a model using data from USA in Dec. , 1985 2. Evaluate that model … Approval M … Y … … … F … N Measure in a cell: • Accuracy of the model • Predictiveness of Race measured based on that model • Similarity between that model and a given model Model h(X, [USA, Dec 04](D)) E. g. , decision tree 8
Example (5/7): Model-Similarity Given: - Data table D - Target model h 0(X) - Test set w/o labels Data table D Location Time Race Sex … Approval AL, USA … Dec, 04 White … … M … … … Yes … WY, USA Dec, 04 Black F … No 2004 2003 … De De Jan … … c c CA 0. 4 0. 2 0. 3 0. 6 0. 5 … US 0. 2 0. 3 0. 9 A … … … Level: [Country, Month] … … … Build a model Similarity The loan decision process in USA during Dec 04 h 0(X) was similar to a discriminatory decision model Race Sex … White … F … Yes … … … Black M No … Yes Test set 9
Example (6/7): Predictiveness Given: - Data table D - Attributes V - Test set w/o labels 2004 2003 Data table D Location Time Race Sex … Approval AL, USA … Dec, 04 White … … M … … … Yes … WY, USA Dec, 04 Black F … No … Ja De … … … n c CA 0. 4 0. 2 0. 3 0. 6 0. 5 … … … US 0. 2 0. 3 0. 9 A Level: … … … Month] … … … [Country, Yes No. . Yes h(X) Yes No. . No Build models h(X V) Predictiveness of V Race was an important predictor of loan approval decision in USA during Dec 04 Race Sex … White … F … … … Black M … Test set 10
Example (7/7): Prediction Cube 2004 Ja n CA US A 2003 … De c Ja n 0. 4 0. 1 0. 3 0. 7 0. 4 0. 3 … … De c … 0. 6 0. 8 … … 0. 3 … … … Cell value: … …Predictiveness … … … of …Race … … CA Drill down USA … Roll up Ja n 2004 … AB 0. 4 0. 2 … 0. 1 YT De c 04 03 … CA 0. 3 0. 2 … US A … 0. 2 0. 3 … … Ja n 2003 … 0. 1 0. 2 0. 1 0. 3 0. 2 0. 1 AL 0. 2 0. 1 … 0. 3 WY … De c … … … … 0. 2 … … … … 0. 1 … … … 0. 9 0. 7 0. 8 … … … 11
Efficient Computation n Reduce prediction cube computation to data cube computation ¡ Represent a data-mining model as a distributive or algebraic (bottom-up computable) aggregate function, so that data-cube techniques can be directly applied 12
Bottom-Up Data Cube Computation 1985 1986 1987 1988 47 107 76 67 1985 1986 1987 1988 Norway 10 30 20 24 Norway 84 … 23 45 14 32 … 114 USA 14 32 42 11 USA 99 All All 297 All Cell Values: Numbers of loan applications 13
Functions on Sets n n n Bottom-up computable functions: Functions that can be computed using only summary information Distributive function: (X) = F({ (X 1), …, (Xn)}) ¡ X = X 1 … Xn and Xi Xj = ¡ E. g. , Count(X) = Sum({Count(X 1), …, Count(Xn)}) Algebraic function: (X) = F({G(X 1), …, G(Xn)}) ¡ G(Xi) returns a length-fixed vector of values ¡ E. g. , Avg(X) = F({G(X 1), …, G(Xn)}) n G(Xi) = [Sum(Xi), Count(Xi)] n F({[s 1, c 1], …, [sn, cn]}) = Sum({si}) / Sum({ci}) 14
Scoring Function n n Represent a model as a function of sets Conceptually, a machine-learning model h(X; Z(D)) is a scoring function Score(y, x; Z(D)) that gives each class y a score on test example x ¡ ¡ ¡ h(x; Z(D)) = argmax y Score(y, x; Z(D)) p(y | x, Z(D)) Z(D): The set of training examples (a cube subset of D) 15
Bottom-up Score Computation n Key observations: ¡ ¡ Observation 1: Score(y, x; Z(D)) is a function of cube subset Z(D); if it is distributive or algebraic, the data cube bottom-up technique can be directly applied Observation 2: Having the scores for all the test examples and all the cells is sufficient to compute a prediction cube n n Scores predictions cell values Details depend on what each cell means (i. e. , type of prediction cubes); but straightforward 16
Machine-Learning Models n Naïve Bayes: ¡ n Kernel-density-based classifier: ¡ n Scoring function: distributive Decision tree, random forest: ¡ n Scoring function: algebraic Neither distributive, nor algebraic PBE: Probability-based ensemble (new) ¡ ¡ To make any machine-learning model distributive Approximation 17
Probability-Based Ensemble Decision tree on [WA, 85] PBE version of decision tree on [WA, 85] 1985 Jan … WA … … … 1985 Dec Jan … De c … W … A … Decision trees built on the lowest-level cells 18
Probability-Based Ensemble n Scoring function: ¡ ¡ h(y | x; bi(D)): Model h’s estimation of p(y | x, bi(D)) g(bi | x): A model that predicts the probability that x belongs to base subset bi(D) 19
Outline n n n Motivating example Definition of prediction cubes Efficient prediction cube materialization Experimental results Conclusion 20
Experiments n Quality of PBE on 8 UCI datasets ¡ The quality of the PBE version of a model is slightly worse (0 ~ 6%) than the quality of the model trained directly on the whole training data. PBE n n … W A … … 1985 … … vs. … W A … … Efficiency of the bottom-up score computation technique Case study on demographic data 21
Efficiency of Bottom-up Score Computation n Machine-learning models: ¡ ¡ n J 48: J 48 decision tree RF: Random forest NB: Naïve Bayes KDC: Kernel-density-based classifier Bottom-up method vs. Exhaustive method PBE-J 48 PBE-RF NB KDC J 48 ex RFex NBex KDCex 22
Synthetic Dataset n Dimensions: Z 1, Z 2 and Z 3. Z 1 and Z 2 n Z 3 Decision rule: Condition Rule When Z 1>1 Y = I(4 X 1+3 X 2+2 X 3+X 4+0. 4 X 6 > 7) else when Z 3 mod 2 = 0 Y = I(2 X 1+2 X 2+3 X 3+3 X 4+0. 4 X 6 > 7) else Y = I(0. 1 X 5+X 1>1) 23
Efficiency Comparison Execution Time (sec) Using exhaustive method Using bottom-up score computation # of Records 24
Related Work: Building models on OLAP Results n Multi-dimensional regression [Chen, VLDB 02] ¡ ¡ n Goal: Detect changes of trends Build linear regression models for cube cells Step-by-step regression in stream cubes [Liu, PAKDD 03] n Loglinear-based quasi cubes [Barbara, J. IIS 01] ¡ n Use loglinear model to approximately compress dense regions of a data cube Net. Cube [Margaritis, VLDB 01] ¡ Build Bayes Net on the entire dataset of approximate answer count queries 25
Related Work (Contd. ) n Cubegrades [Imielinski, J. DMKD 02] ¡ ¡ n Constrained gradients [Dong, VLDB 01] ¡ n Extend cubes with ideas from association rules How does the measure change when we rollup or drill down? Find pairs of similar cell characteristics associated with big changes in measure User-cognizant multidimensional analysis [Sarawagi, VLDBJ 01] ¡ Help users find the most informative unvisited regions in a data cube using max entropy principle n Multi-Structural DBs [Fagin et al. , PODS 05, VLDB 05] 26
Take-Home Messages n Promising exploratory data analysis paradigm: ¡ ¡ Can use models to identify interesting subsets Concentrate only on subsets in cube space n ¡ n Those are meaningful subsets, tractable Precompute results and provide the users with an interactive tool A simple way to plug “something” into cubestyle analysis: ¡ Try to describe/approximate “something” by a distributive or algebraic function 27
Big Picture n n n Why stop with decision behavior? Can apply to other kinds of analyses too Why stop at browsing? Can mine prediction cubes in their own right Exploratory analysis of mining space: ¡ ¡ Dimension attributes can be parameters related to algorithm, data conditioning, etc. Tractable evaluation is a challenge: n Large number of “dimensions”, real-valued dimension attributes, difficulties in compositional evaluation n Active learning for experiment design, extending compositional methods 28
Community Information Management (CIM) Anhai Doan University of Illinois at Urbana-Champaign Raghu Ramakrishnan University of Wisconsin-Madison UI
Structured Web-Queries n Example Queries: ¡ ¡ ¡ n UI How many alumni are top-10 faculty members? n Wisconsin does very well, by the way Find trends in publications n By topic, by conference, by alumni of schools Change tracking n Alert me if my co-authors publish new papers or move to new jobs Information is extracted from text sources on the web, then queried 30
Key Ideas n Communities are ideally scoped chunks of the web for which to build enhanced portals ¡ ¡ n UI Relative uniformity in content, interests Can exploit “people power” via mass collaboration, to augment extraction CIM platform: Facilitate collaborative creation and maintenance of community portals ¡ Extraction management n ¡ Uncertainty, provenance, maintenance, compositional inference for refining extracted information Mass collaboration for extraction and integration Watch for new DBWorld! 31
Challenges n UI User Interaction ¡ ¡ ¡ Declarative specification of background knowledge and user feedback Intelligent prompting for user input Explanation of results 32
Challenges n UI Extraction and Query Plans ¡ ¡ ¡ Starting from user input (ER schema, hints) and background knowledge (e. g. , standard types, look-up tables), compile a query into an execution plan Must cover extraction, storage and indexing, and relational processing n And maintenance! Algebra to represent such plans? Query optimizer? n Handling uncertainty, constraints, conflicts, multiple related sources, ranking, modular architecture 33
Challenges n UI Managing extracted data ¡ ¡ Mapping between extracted metadata and source data Uncertainty of mapping Conflicts (in user input, background knowledge, or from multiple sources) Evolution over time 34
- Slides: 34