Introduction to Graphical Presentation Andy Wang CIS 5930

  • Slides: 109
Download presentation
Introduction to Graphical Presentation Andy Wang CIS 5930 -03 Computer Systems Performance Analysis

Introduction to Graphical Presentation Andy Wang CIS 5930 -03 Computer Systems Performance Analysis

The Art of Graphical Presentation • • • Reference Works Types of Variables Guidelines

The Art of Graphical Presentation • • • Reference Works Types of Variables Guidelines for Good Graphics Charts Common Mistakes in Graphics Pictorial Games Special-Purpose Charts 2

Useful Reference Works • Edward R. Tufte, The Visual Display of Quantitative Information, Graphics

Useful Reference Works • Edward R. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, Connecticut, 1983. • Edward R. Tufte, Envisioning Information, Graphics Press, Cheshire, Connecticut, 1990. • Edward R. Tufte, Visual Explanations, Graphics Press, Cheshire, Connecticut, 1997. • Darrell Huff, How to Lie With Statistics, W. W. Norton & Co. , New York, 1954 3

Types of Variables • Qualitative – Ordered (e. g. , modem, Ethernet, satellite) –

Types of Variables • Qualitative – Ordered (e. g. , modem, Ethernet, satellite) – Unordered (e. g. , CS, math, literature) • Quantitative – Discrete (e. g. , number of terminals) – Continuous (e. g. , time) 4

Charting Based on Variable Types • Qualitative variables usually work best with bar charts

Charting Based on Variable Types • Qualitative variables usually work best with bar charts or Kiviat graphs – If ordered, use bar charts to show order • Quantitative variables work well in X-Y graphs – Use points if discrete, lines if continuous – Bar charts sometimes work well for discrete 5

Guidelines for Good Graphics Charts • • • Principles of graphical excellence Principles of

Guidelines for Good Graphics Charts • • • Principles of graphical excellence Principles of good graphics Specific hints for specific situations Aesthetics Friendliness 6

Principles of Graphical Excellence • Graphical excellence is the welldesigned presentation of interesting data:

Principles of Graphical Excellence • Graphical excellence is the welldesigned presentation of interesting data: – Substance – Statistics – Design 7

Graphical Excellence (2) • Complex ideas get communicated with: – Clarity – Precision –

Graphical Excellence (2) • Complex ideas get communicated with: – Clarity – Precision – Efficiency 8

Graphical Excellence (3) • Viewer gets: – Greatest number of ideas – In the

Graphical Excellence (3) • Viewer gets: – Greatest number of ideas – In the shortest time – With the least ink – In the smallest space 9

Graphical Excellence (4) • Is nearly always multivariate • Requires telling truth about data

Graphical Excellence (4) • Is nearly always multivariate • Requires telling truth about data 10

Principles of Good Graphics • • • Above all else show the data Maximize

Principles of Good Graphics • • • Above all else show the data Maximize the data-ink ratio Erase non-data ink Erase redundant data ink Revise and edit 11

Above All Else Show the Data 12

Above All Else Show the Data 12

Above All Else Show the Data 13

Above All Else Show the Data 13

Maximize the Data-Ink Ratio 14

Maximize the Data-Ink Ratio 14

Maximize the Data-Ink Ratio 15

Maximize the Data-Ink Ratio 15

Erase Non-Data Ink 16

Erase Non-Data Ink 16

North West East Erase Non-Data Ink 17

North West East Erase Non-Data Ink 17

North West East Erase Redundant Data Ink 18

North West East Erase Redundant Data Ink 18

North West East Erase Redundant Data Ink 19

North West East Erase Redundant Data Ink 19

Revise and Edit 20

Revise and Edit 20

Revise and Edit 21

Revise and Edit 21

Revise and Edit 22

Revise and Edit 22

Revise and Edit 23

Revise and Edit 23

Revise and Edit 24

Revise and Edit 24

Revise and Edit 25

Revise and Edit 25

Revise and Edit 26

Revise and Edit 26

Specific Things to Do • • • Give information the reader needs Limit complexity

Specific Things to Do • • • Give information the reader needs Limit complexity and confusion Have a point Show statistics graphically Don’t always use graphics Discuss it in the text 27

Give Information the Reader Needs • Show informative axes – Use axes to indicate

Give Information the Reader Needs • Show informative axes – Use axes to indicate range • Label things fully and intelligently • Highlight important points on the graph 28

Giving Information the Reader Needs 29

Giving Information the Reader Needs 29

Giving Information the Reader Needs 30

Giving Information the Reader Needs 30

Limit Complexity and Confusion • • Not too many curves Single scale for all

Limit Complexity and Confusion • • Not too many curves Single scale for all curves No “extra” curves No pointless decoration (“ducks”) 31

Limiting Complexity and Confusion 32

Limiting Complexity and Confusion 32

Limiting Complexity and Confusion 33

Limiting Complexity and Confusion 33

Have a Point • Graphs should add information not otherwise available to reader •

Have a Point • Graphs should add information not otherwise available to reader • Don’t plot data just because you collected it • Know what you’re trying to show, and make sure the graph shows it 34

Having a Point • Sales were up 15% this quarter: 35

Having a Point • Sales were up 15% this quarter: 35

Having a Point 36

Having a Point 36

Having a Point 37

Having a Point 37

Having a Point 38

Having a Point 38

Show Statistics Graphically • Put bars in a reasonable order – Geographical – Best

Show Statistics Graphically • Put bars in a reasonable order – Geographical – Best to worst – Even alphabetic • Make bar widths reflect interval widths – Hard to do with most graphing software • Show confidence intervals on the graph – Examples will be shown later 39

Don’t Always Use Graphics • Tables are best for small sets of numbers –

Don’t Always Use Graphics • Tables are best for small sets of numbers – Tufte says 20 or fewer • Also best for certain arrangements of data – E. g. , 10 graphs of 3 points each • Sometimes a simple sentence will do • Always ask whether the chart is the best way to present the information – And whether it brings out your message 40

Text Would Have Been Better 41

Text Would Have Been Better 41

Discuss It in the Text • Figures should be self-explanatory – Many people scan

Discuss It in the Text • Figures should be self-explanatory – Many people scan papers, just look at graphs – Good graphs build interest, “hook” readers • But text should highlight and aid figures – Tell readers when to look at figures – Point out what figure is telling them – Expand on what figure has to say 42

Aesthetics • Not everyone is an artist – But figures should be visually pleasing

Aesthetics • Not everyone is an artist – But figures should be visually pleasing • Elegance is found in – Simplicity of design – Complexity of data 43

Principles of Aesthetics • • • Use appropriate format and design Use words, numbers,

Principles of Aesthetics • • • Use appropriate format and design Use words, numbers, drawings together Reflect balance, proportion, relevant scale Keep detail and complexity accessible Have story about the data (narrative quality) • Do professional job of drawing • Avoid decoration and chartjunk 44

Use Appropriate Format and Design • Don’t automatically draw a graph – Mentioned before

Use Appropriate Format and Design • Don’t automatically draw a graph – Mentioned before • Choose graphical format carefully • Sometimes “text graphic” works best – Use text placement to communicate numbers – Very close to being a table 45

Using Text as a Graphic CEA: +4. 7 DR: +4. 5 NABE: +4. 5

Using Text as a Graphic CEA: +4. 7 DR: +4. 5 NABE: +4. 5 About a year ago, eight forecasters were asked for their predictions on some key economic indicators. Here’s how the forecasts stack up against the probable 1978 results (shown in the black panel). WEF: 6. 8 CB: 6. 7 NABE: 6. 7 WEF: +4. 5 IBM: 6. 6 CBO: +4. 4 DR: 6. 5 CB: +4. 2 NABE: +6. 2 IBM: +4. 1 IBM: +5. 9 GNP: +3. 8 IPG: +5. 8 CE: +2. 9 CBO: 6. 3 WEF: +21 CEA: 6. 3 CPI: +7. 7 Profits: +13. 3 Unempl: 6. 0 CB: +5. 5 IBM: +6. 6 DR: +10. 5 DR: +5. 2 NABE: +6. 5 IBM: +10. 4 WEF: +4. 8 CB: +6. 2 CE: +6. 5 (New York Times, Jan. 2, 1979) 46

The Stem-and-Leaf Plot • From Tukey, via Tufte, heights of volcanoes in feet: 0|98766562

The Stem-and-Leaf Plot • From Tukey, via Tufte, heights of volcanoes in feet: 0|98766562 1|97719630 2|99987766544422211009850 3|876655412099551426 4|9998844331929433361107 5|97666666554422210097731 6|898665441077761065 7|98855431100652108073 8|653322122937 47

Choosing a Graphical Format • Many options, more being invented all the time –

Choosing a Graphical Format • Many options, more being invented all the time – Examples will be given later – See Jain for some commonly useful ones – Tufte shows ways to get creative • Choose a format that reflects your data – Or that helps you analyze it yourself 48

Use Words, Numbers, Drawings Together • Put graphics near or in text that discusses

Use Words, Numbers, Drawings Together • Put graphics near or in text that discusses them – Even if you have to murder your word processor • Integrate text into graphics • Tufte: “Data graphics are paragraphs about data and should be treated as such” 49

Reflect Balance, Proportion, Relevant Scale • Much of this boils down to “artistic sense”

Reflect Balance, Proportion, Relevant Scale • Much of this boils down to “artistic sense” • Make sure things are big enough to read – Tiny type is OK only for young people! • Keep lines thin – But use heavier lines to indicate important information • Keep horizontal larger than vertical – About 50% larger works well 50

Poor Balance and Proportion • Sales in the North and West districts were steady

Poor Balance and Proportion • Sales in the North and West districts were steady through all quarters • East sales varied widely, significantly outperforming the other districts in the third quarter 51

Better Proportion • Sales in North and West districts were steady through all quarters

Better Proportion • Sales in North and West districts were steady through all quarters • East sales varied widely, significantly outperforming other districts in third quarter 52

Keep Detail and Complexity Accessible • Make your graphics friendly: – Avoid abbreviations and

Keep Detail and Complexity Accessible • Make your graphics friendly: – Avoid abbreviations and encodings – Run words left-to-right – Explain data with little messages – Label graphic, don’t use elaborate shadings and a complex legend – Avoid red/green distinctions – Use clean, serif fonts in mixed case 53

An Unfriendly Graph 54

An Unfriendly Graph 54

A Friendly Version 55

A Friendly Version 55

Even Friendlier 56

Even Friendlier 56

Have a Story About the Data (Narrative Quality) • May be difficult in technical

Have a Story About the Data (Narrative Quality) • May be difficult in technical papers • But think about why you are drawing graph • Example: – Performance is controlled by network speed – But it tops out at high end – And that’s because we hit a CPU bottleneck 57

Showing a Story About the Data 58

Showing a Story About the Data 58

Do a Professional Job of Drawing • This is easy with modern tools –

Do a Professional Job of Drawing • This is easy with modern tools – But take the time to do it right • Align things carefully • Check final version in format you will use – I. e. , print Postscript one last time before submission – Or look at your slides on projection screen • Preferably in presentation room • Color balance varies by projector 59

Avoid Decoration and Chartjunk • Powerpoint, etc. make chartjunk easy • Avoid clip art,

Avoid Decoration and Chartjunk • Powerpoint, etc. make chartjunk easy • Avoid clip art, automatic backgrounds, etc. • Remember: data is the story – Statistics aren’t boring – Uninterested readers aren’t drawn by cartoons – Interested readers are distracted • Does removing it change message? – If not, leave it out 60

Examples of Chartjunk In or out? Borders and Fills Galore Vibration Filled “Walls” Filled

Examples of Chartjunk In or out? Borders and Fills Galore Vibration Filled “Walls” Filled Labels Pointless Fake 3 -D Effects Gridlines! Unintentional Heavy or Double Lines Serif Font with Thin & Thick Lines Filled “Floor” Clip Art 61

Common Mistakes in Graphics • • • Excess information Multiple scales Using symbols in

Common Mistakes in Graphics • • • Excess information Multiple scales Using symbols in place of text Poor scales Using lines incorrectly 62

Excess Information • Sneaky trick to meet length limits • Rules of thumb: –

Excess Information • Sneaky trick to meet length limits • Rules of thumb: – 6 curves on line chart – 10 bars on bar chart – 8 slices on pie chart • But note that Tufte hates pie charts • Extract essence, don’t cram things in 63

Way Too Much Information 64

Way Too Much Information 64

What’s Important About That Chart? • Times for cp and rcp rise with number

What’s Important About That Chart? • Times for cp and rcp rise with number of replicas • Most other benchmarks are near constant • Exactly constant for rm 65

The Right Amount of Information 66

The Right Amount of Information 66

Multiple Scales • Another way to meet length limits • Basically, two graphs overlaid

Multiple Scales • Another way to meet length limits • Basically, two graphs overlaid on each other • Confuses reader (which line goes with which scale? ) • Misstates relationships – Implies equality of magnitude that doesn’t exist 67

Some Especially Bad Multiple Scales 68

Some Especially Bad Multiple Scales 68

Using Symbols in Place of Text • Graphics should be self-explanatory – Remember that

Using Symbols in Place of Text • Graphics should be self-explanatory – Remember that the graphs often draw the reader in • So use explanatory text, not symbols • This means no Greek letters! – Unless your conference is in Athens. . . 69

It’s All Greek To Me. . . 70

It’s All Greek To Me. . . 70

Explanation is Easy 71

Explanation is Easy 71

Poor Scales • Plotting programs love non-zero origins – But people are used to

Poor Scales • Plotting programs love non-zero origins – But people are used to zero • Fiddle with axis ranges (and logarithms) to get your message across – But don’t lie or cheat • Sometimes trimming off high ends makes things clearer – Brings out low-end detail 72

A Poor Axis Range 75

A Poor Axis Range 75

A Logarithmic Range 76

A Logarithmic Range 76

A Truncated Range 77

A Truncated Range 77

Using Lines Incorrectly • Don’t connect points unless interpolation is meaningful • Don’t smooth

Using Lines Incorrectly • Don’t connect points unless interpolation is meaningful • Don’t smooth lines that are based on samples – Exception: fitted non-linear curves 78

Incorrect Line Usage 79

Incorrect Line Usage 79

Pictorial Games • • • Non-zero origins and broken scales Double-whammy graphs Omitting confidence

Pictorial Games • • • Non-zero origins and broken scales Double-whammy graphs Omitting confidence intervals Scaling by height, not area Poor histogram cell size 80

Non-Zero Origins and Broken Scales • People expect (0, 0) origins – Subconsciously •

Non-Zero Origins and Broken Scales • People expect (0, 0) origins – Subconsciously • So non-zero origins are great way to lie • More common than not in popular press • Also very common to cheat by omitting part of scale – “Really, Your Honor, I included (0, 0)” 81

Non-Zero Origins 82

Non-Zero Origins 82

The Three-Quarters Rule • Highest point should be 3/4 of scale or more 83

The Three-Quarters Rule • Highest point should be 3/4 of scale or more 83

Double-Whammy Graphs • Put two related measures on same graph – One is (almost)

Double-Whammy Graphs • Put two related measures on same graph – One is (almost) function of other • Hits reader twice with same information – And thus overstates impact 84

Omitting Confidence Intervals • Statistical data is inherently fuzzy • But means appear precise

Omitting Confidence Intervals • Statistical data is inherently fuzzy • But means appear precise • Giving confidence intervals can make it clear there’s no real difference – So liars and fools leave them out 85

Graph Without Confidence Intervals 86

Graph Without Confidence Intervals 86

Graph With Confidence Intervals 87

Graph With Confidence Intervals 87

Scaling by Height Instead of Area • Clip art is popular with illustrators: Women

Scaling by Height Instead of Area • Clip art is popular with illustrators: Women in the Workforce 1960 1980 88

The Trouble with Height Scaling • Previous graph had heights of 2: 1 •

The Trouble with Height Scaling • Previous graph had heights of 2: 1 • But people perceive areas, not heights – So areas should be what’s proportional to data • Tufte defines lie factor: size of effect in graphic divided by size of effect in data – Not limited to area scaling – But especially insidious there (quadratic effect) 89

Scaling by Area • Same graph with 2: 1 area: Women in the Workforce

Scaling by Area • Same graph with 2: 1 area: Women in the Workforce 1960 1980 90

Poor Histogram Cell Size • Picking bucket size is always problem • Prefer 5

Poor Histogram Cell Size • Picking bucket size is always problem • Prefer 5 or more observations per bucket • Choice of bucket size can affect results: 91

Principles of Graphics Integrity (Tufte) • • • Proportional representation of numbers Clear, detailed,

Principles of Graphics Integrity (Tufte) • • • Proportional representation of numbers Clear, detailed, thorough labeling Show data variation, not design variation Use deflated money units Don’t have more dimensions than data has Don’t quote data out of context 92

Proportional Representation of Numbers • Maintain lie factor of 1. 0 • Use areas,

Proportional Representation of Numbers • Maintain lie factor of 1. 0 • Use areas, not heights, with clip art • Avoiding “decorative” graphs will do wonders – Not too hard for most engineers! 93

Clear, Detailed, Thorough Labeling • Goal is to defeat distortion and ambiguity • Write

Clear, Detailed, Thorough Labeling • Goal is to defeat distortion and ambiguity • Write explanations on graphic itself • Label important events in the data 94

Show Data Variation, Not Design Variation • Use one design for entire graphic •

Show Data Variation, Not Design Variation • Use one design for entire graphic • In papers, try to use one design for all graphs • Again, artistic license is big culprit 95

Use Deflated Money Units • Often necessary to show money over time – Even

Use Deflated Money Units • Often necessary to show money over time – Even in computer science – E. g. , price/performance over time – Or expected future cost of a disk • Nominal dollars are meaningless • Derate by some standard inflation measure – That’s what the WWW is for! 96

Don’t Have More Dimensions Than Data Has • This gets back to the Lie

Don’t Have More Dimensions Than Data Has • This gets back to the Lie Factor • 1 -D data (e. g. , money) should occupy one dimension on the graph: not • Clip art is prohibited by this rule – But if you have to, use an area measure $1. 00 $2. 00 97

Don’t Quote Data Out of Context • Tufte’s example: 98

Don’t Quote Data Out of Context • Tufte’s example: 98

The Same Data in Context 99

The Same Data in Context 99

Special-Purpose Charts • • • Tukey’s box plot Histograms Scatter plots Gantt charts Kiviat

Special-Purpose Charts • • • Tukey’s box plot Histograms Scatter plots Gantt charts Kiviat graphs 100

Tukey’s Box Plot • Shows range, median, quartiles all in one: minim quarti um

Tukey’s Box Plot • Shows range, median, quartiles all in one: minim quarti um le medi an quarti le maxim um • Tufte can’t resist improvements: or or even 101

Histograms • Tufte improves everything about them: 102

Histograms • Tufte improves everything about them: 102

Scatter Plots • Useful in statistical analysis • Also excellent for huge quantities of

Scatter Plots • Useful in statistical analysis • Also excellent for huge quantities of data – Can show patterns otherwise invisible 103

Better Scatter Plots • Again, Tufte improves the standard – But it can be

Better Scatter Plots • Again, Tufte improves the standard – But it can be a pain with automated tools • Can use modified Tukey box plot for axes 104

Gantt Charts • Shows relative duration of Boolean conditions • Arranged to make lines

Gantt Charts • Shows relative duration of Boolean conditions • Arranged to make lines continuous – Each level after first follows FTTF pattern 105

Kiviat Graphs • Also called “star charts” or “radar plots” • Useful for looking

Kiviat Graphs • Also called “star charts” or “radar plots” • Useful for looking at balance between HB and LB metrics 106

A Few Examples • A bad graph • Two good graphs 107

A Few Examples • A bad graph • Two good graphs 107

A Very Bad Graph 108

A Very Bad Graph 108

A Good Graph: Sunspots 109

A Good Graph: Sunspots 109

A Superb Graph: DEC Traces 110

A Superb Graph: DEC Traces 110

White Slide

White Slide