Empirical Evaluation Chris North cs 5984 Information Visualization
- Slides: 21
Empirical Evaluation Chris North cs 5984: Information Visualization
Evaluating Visualizations • Expert Review • Examination by visualization expert • Heuristic Evaluation • Principles, Guidelines • Algorithmic • Usability Evaluation • Observation, problem identification • Empirical Experiment ** • Controlled scientific experiment, “user study” • Comparisons, statistical analysis
What is Science? • Measurement • Modeling
Scientific Method 1. 2. 3. 4. Form Hypothesis Collect data Analyze Accept/reject hypothesis
Deep Questions • Is ‘computer science’ science? • How can you “prove” a hypothesis with science?
Empirical Experiment • Typical question: • Which visualization is better in which situations? Lifelines Perspective. Wall
More Rigorous Question • Does Vis Tool (Lifelines or Persp. Wall) have an effect on user performance time for task X? • Null hypothesis: • No effect • Lifelines = Persp. Wall • Want to disprove, provide counter-example, show an effect
Variables • Independent Variables (what you vary) and treatments (the variable values): • Visualization tool » Lifelines, Perspective Wall, Text UI • Task type » Find, count, pattern, compare • Data size (# of items) » 100, 1000000 • Dependent Variables (what you measure) • • User performance time Errors Subjective satisfaction (survey) HCI metrics!
Example: 2 x 3 design Ind Var 2: Task Type Task 1 Task 2 Task 3 Life. Ind Var 1: Lines Vis. Tool Persp. Wall • n users per cell Measured user performance times (dep var)
Groups • “Between subjects” variable • • 1 group of users for each variable treatment Group 1: 20 users, Lifelines Group 2: 20 users, Persp. Wall Total: 40 users, 20 per cell • “With-in subjects” (repeated) variable • • • All users perform all treatments Counter-balancing order effect Group 1: 20 users, Lifelines then Persp. Wall Group 2: 20 users, Persp. Wall then Lifelines Total: 40 users, 40 per cell
Issues • Randomized • Fairness • Identical procedures • • Bias • User privacy, data security
Procedure • For each user: • Sign legal forms • Pre-Survey: demographics • Instructions » Do not reveal true purpose of experiment • Training runs • Actual runs • Post-Survey: subjective measures • * n users
Data • Measured dependent variables • Spreadsheet: • Lifelines task 1, 2, 3, Persp. Wall task 1, 2, 3
Averages Ind Var 2: Task Type Life. Ind Var 1: Lines Vis. Tool Persp. Wall Task 1 Task 2 Task 3 37. 2 54. 5 103. 7 29. 8 53. 2 145. 4 Measured user performance times (dep var)
Persp. Wall better than Lifelines? Perf time (secs) Lifelines persp. Wall • Problem with Averages: lossy • Compares only 2 numbers • What about the 40 data values? (Show me the data!)
The real picture Perf time (secs) Lifelines persp. Wall • Need stats that take all data into account
Statistics • t-test • Compares 1 dep var on 2 treatments of 1 ind var • ANOVA: Analysis of Variance • Compares 1 dep var on n treatments of m ind vars • Result: “significant difference” between treatments? • p = significance level (confidence) • typical cut-off: p < 0. 05
p < 0. 05 • • Woohoo! Found a “statistically significant difference” Averages determine which is ‘better’ Conclusion: • • Vis Tool has an “effect” on user performance for task 1 Persp. Wall better user performance than Lifelines for task 1 “ 95% confident that Persp. Wall better than Lifelines” Not “Persp. Wall beats Lifelines 95% of time” • Found a counter-example to the null-hypothesis • Null-hypothesis: Lifelines = Persp. Wall • Hence: Lifelines Persp. Wall
p > 0. 05 • Hence, same? • Vis Tool has no effect on user performance for task 1? • Lifelines = Persp. Wall ? • NOT! • • We did not detect a difference, but could still be different Did not find a counter-example to null hypothesis Provides evidence for Lifelines = Persp. Wall, but not proof Boring! Basically found nothing • How? • Not enough users • Need better tasks, data, …
Data Mountain • Robertson, “Data Mountain” • Quoc, Reenal (Microsoft)
Assignment • Thurs: Visualization Development • Bederson, “Jazz” » Jun, Rohit • Literature Review due Thurs • Homework #2 due thurs oct 4
- Dd form 2406
- Ece 5984
- Chris yano dui
- Chris north properties
- Information visualization ppt
- Introduction to information visualization
- Information visualization
- North carolina teacher evaluation rubric indicators
- Randa teacher evaluation
- True north vs magnetic north
- North east and cumbria ics
- North and south lesson 3 southern cotton kingdom
- The north pole ____ a latitude of 90 degrees north
- Information retrieval evaluation
- Information retrieval evaluation
- Information search and evaluation
- Prims algorithm visualization
- Ocean data visualization
- Vli demo tool
- Red black tree visualization
- Horspool algorithm visualization
- Spatial visualization training