Visualizing Multidimensional Clusters Trends and Outliers using Star

  • Slides: 21
Download presentation
Visualizing Multi-dimensional Clusters, Trends, and Outliers using Star Coordinates Author : Eser Kandogan Reporter

Visualizing Multi-dimensional Clusters, Trends, and Outliers using Star Coordinates Author : Eser Kandogan Reporter : Tze Ho-Lin 2007/5/9 SIGKDD, 2001 1

Outline Motivation n Objectives n Methodology: Star Coordinates n Interaction techniques n Evaluation n

Outline Motivation n Objectives n Methodology: Star Coordinates n Interaction techniques n Evaluation n Conclusion n Personal Comments n 2

Motivation n Real datasets contain typically more than three attributes of data, representing and

Motivation n Real datasets contain typically more than three attributes of data, representing and making sense of multi-dimensional data has been challenging. 3

Objectives n The objective for this paper is to relieve the dimensionality curse on

Objectives n The objective for this paper is to relieve the dimensionality curse on knowledge discovery through simple data representations that are derived from familiar and easy to understand lower dimensional representations. 4

Methodology 5

Methodology 5

Methodology j: 資料點 i: 屬性 6

Methodology j: 資料點 i: 屬性 6

Interaction techniques 1. 2. 3. 4. 5. 6. 7. Scaling Rotation Marking Range Selection

Interaction techniques 1. 2. 3. 4. 5. 6. 7. Scaling Rotation Marking Range Selection Histogram Footprints Sticks 7

Evaluation 8

Evaluation 8

Conclusion n Star Coordinates, aims to let a representation of the higher dimensional space

Conclusion n Star Coordinates, aims to let a representation of the higher dimensional space built on the wellknown simple representations and also through dynamic interactions that allow users to discover trends, outliers, and clusters easily. 9

Personal Comments n Application q n Advantage q n Data visualization Simple & Easy

Personal Comments n Application q n Advantage q n Data visualization Simple & Easy to understand Disadvantage q The figures in this paper is rough. 10

Scaling 11

Scaling 11

Rotation 12

Rotation 12

Range Selection 13

Range Selection 13

Histogram 14

Histogram 14

Footprints 15

Footprints 15

Sticks 16

Sticks 16

Evaluation - Figure 12 17

Evaluation - Figure 12 17

Evaluation - Figure 13 18

Evaluation - Figure 13 18

Evaluation - Figure 14. Data point distribution after removing state, area code, phone number,

Evaluation - Figure 14. Data point distribution after removing state, area code, phone number, and total minute and calls for day, evening, night, and international calls. 19

Evaluation - Figure 15. Data point partitioned into four clusters based on international service

Evaluation - Figure 15. Data point partitioned into four clusters based on international service plan and voice plan membership. 20

Evaluation - Figure 16. Total day charge and number of customer service calls play

Evaluation - Figure 16. Total day charge and number of customer service calls play the most significant role in churn for customers without international and voice mail service plans‧. 21