An Intelligent Visual Big Data Analytics Framework for

An Intelligent Visual Big Data Analytics Framework for Supporting Interactive Exploration and Visualization of Big OLAP Cubes Carlos Ordonez, University of Houston, USA Zhibo Chen, University of Houston, USA Alfredo Cuzzocrea, University of Calabria and LORIA, Italy/France Javier Garcia-Garcia C 3 UNAM, Mexico

Outline Big Cube Exploration Problem Example Visualization solution Means comparison parametric test Conclusions and future work System demonstration

Big Cube Exploration Problem In a big cube a large data set F is analyzed with multiple aggregations, by different subsets of dimensions, to discover interesting results. Such multiple dimension combinations, resemble a multidimensional cube, whose mathematical structure is represented by a lattice, a mathematical structure behind all subsets from a set. Cubes return descriptive statistics (aggregations) such as counts, sums, averages and standard deviations.

Cube example: dimensions+measure d=3 dimensions D 1 , D 2 , D 3 and one measure A 1 Each face of the cube represents a 2 -dimensional cuboid. Here there are two sets of cell pairs within one cuboid that differ in exactly one dimension. The difference in fill pattern indicates there is a significant difference on a specific measure attribute A 1.

Cube Visualization Proposal Visualizing the dimension lattice in 2 D Manipulating binary dimensions with a 2 -color checkerboard Highlighting interesting cube cell pairs discovered from the cube. Visualizing associated image data, for each group Enhancing cube group comparisons with parametric statistical tests to get high statistical reliability.

Visualizing a Cube with Binary Dimensions

Data mining objectives Discovering significant differences between two groups in a cuboid on at least one measure. A significant difference can only be supported by a small pvalue When there exists a significant difference we isolate those groups that differ in one dimension, which explains cause-effect. The algorithm aims to discover significant differences in highly similar cube cells because that helps point out which specific dimension “triggers” a significant change on the cuboid measure.

Means comparison parametric test Two similar independent populations 1 and 2: Means: and Sizes: N 1 and N 2 Variances: and Null hypothesis: Goal : finding pairs of populations in which rejected with high confidence can be We use a two-tailed test which allows finding a significant difference on both tails of the Gaussian distribution. We reject H 0 with high confidence 1 -p, where p = 0. 01 or 0. 05 or 0. 1. In general, for large N we compute z and look it up in N(0, 1). Except small N: t-student

Advantages of statistical tests in Big Cubes Two cube groups of any size can be compared including groups with very different number of elements (e. g. a large and a small group). The means comparison test takes into account variance and standard deviation, which measures overlap between populations. User can narrow down on highly similar cube groups, differing in a few dimensions

Significant Pairs of Cube Cells

Conclusions 2 D cube visualization Binary dimensions shown in a 2 -color checkerboard Fast, interactive exploration of the lattice with a twotier design that allows the user to quickly switch between cuboids Pairs of similar groups are compared with a statistical test to discover specific cell pairs that cause a significant difference in some cube measure value. Application in Medicine to improve heart disease diagnosis

Future work A “flat” 2 D cube representation is more intuitive & faster to manipulate than a 3 D display, but we would like to compare its strengths and weaknesses with a 3 D visualization. We need to study mathematical relationships between the dimensions lattice and cube visual representations in 2 D. The visualization of cubes requires a time complexity study when confronted with a cube having a large number of dimensions.

System demonstration