Organizing and Displaying Data Data Files Data is

  • Slides: 79
Download presentation
Organizing and Displaying Data

Organizing and Displaying Data

Data Files Data is almost always stored in a format where: ROWS are cases

Data Files Data is almost always stored in a format where: ROWS are cases or individuals and COLUMNS are variables

Complete Data Table on Male Heart Attack Patients ID 390 279 391 202 69

Complete Data Table on Male Heart Attack Patients ID 390 279 391 202 69 310 392 311 393 70 203 394 204 280 55 79 205 206 312 80 281 207 282 396 208 209 283 210 397 211 398 284 399 285 71 286 212 400 287 81 813 68 288 407 a EJEC 72 52 62 50 50 27 60 72 60 63 29 48 59 50 53 17 37 45 43 60 38 57 44 51 63 49 48 58 42 68 43 67 52 63 54 37 51 42 66 54 39 63 30 59 67 SYSVOL 36 74 52 165 47 124 86 37 65 52 117 69 54 67 65 184 88 106 85 59 103 53 68 53 58 81 58 71 92 50 146 43 70 73 62 93 65 95 49 66 144 52 219 39 39 DIAVOL 131 155 137 329 95 170 215 132 163 140 164 133 135 138 221 140 193 150 149 168 124 121 109 157 112 167 159 156 259 130 146 195 133 148 133 163 144 145 237 141 314 94 117 OCCLU 0 37 33 33 0 77 7 40 0 0 50 0 30 37 0 57 37 33 0 7 47 0 27 0 0 13 0 27 0 0 47 0 0 27 33 47 43 40 10 7 13 0 33 0 0 STEN 0 63 47 30 100 23 50 10 40 10 0 27 13 63 33 13 47 43 50 37 43 57 60 77 73 13 0 0 0 100 33 70 23 0 7 10 50 40 87 47 45 0 73 TIME 143 16 143 143 40 9 142 142 142 141 140 5 118 140 23 139 100 140 139 139 72 138 139 138 3 138 137 136 109 65 136 43 76 135 53 OUTCOME 0 0 2 0 0 5 0 0 0 0 1 5 0 1 0 0 0 0 3 1 0 1 AGE 49 54 56 42 46 57 51 56 45 46 48 54 39 49 58 50 58 47 51 43 55 58 55 41 51 49 56 45 57 51 56 49 47 36 38 59 54 57 52 47 39 48 53 47 57 SMOKE 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 1 1 2 BETA 2 2 2 1 2 2 1 2 1 2 2 2 1 2 2 2 CHOLa 59 68 52 39 74 NA 58 75 72 90 72 NA NA 86 49 70 NA 38 61 56 62 93 63 45 60 60 57 46 58 NA 70 NA NA 61 NA NA 55 62 56 NA NA 63 62 SURG 0 1 0 0 1 2 0 0 0 0 2 0 1 0 1 4 0 0 0 0 3 0 0 0 4 0 0 3 0 0 0 2

Row 1 Portion of the Data Table on Male Heart Attack Patients Variable values

Row 1 Portion of the Data Table on Male Heart Attack Patients Variable values for subject #390 EJEC SYSVOL OUTCOME AGE 390 72 36 131 0 0 143 0 49 2 2 59 0 279 52 74 155 37 63 143 0 54 2 2 68 1 391 62 52 137 33 47 16 2 56 2 2 52 0 201 50 165 329 33 30 143 0 42 2 2 39 0 202 50 47 95 0 100 143 0 46 2 2 74 1 69 27 124 170 77 23 143 0 57 2 2 NA 2 310 60 86 215 7 50 40 0 51 2 2 58 0 392 72 37 132 40 10 9 5 56 2 2 75 0 311 60 65 163 0 40 45 2 2 72 0 393 63 52 140 0 for 10 Systolic 142 volume 0 46 2 2 90 0 ID DIAVOL OCCLU STEN TIME Column 3 142 0 the first 10 subjects SMOKE BETA CHOL SURG

Variables ID: A patient identifier instead of a name. EJEC: Ejection fraction, % of

Variables ID: A patient identifier instead of a name. EJEC: Ejection fraction, % of blood ejected from left ventricle in one beat. SYSVOL: End-systolic volume, a measure of the size of the heart. DIAVOL: End-diastolic volume. OCCLU: Occlusion score (% of myocardium of the left ventricle supplied by arteries that are totally blocked).

Variables STEN: Stenosis score (% supplied by arteries that are significantly narrowed but not

Variables STEN: Stenosis score (% supplied by arteries that are significantly narrowed but not completely blocked). TIME: Time in months from when patient was admitted until OUTCOME: Coded variable 0 = alive at last follow up 1 = sudden cardiac death 2 = death within 30 days of heart attack 3 = death from heart failure 4 = death during / after coronary surgery 5 = non-cardiac death

Variables SMOKE: Coded variable 1 = patient continued to smoke 2 = patient did

Variables SMOKE: Coded variable 1 = patient continued to smoke 2 = patient did not continue smoking BETA: Coded variable 1 = patient took beta blockers 2 = patient did not take beta blockers AGE: Patient’s age at admission (years) CHOL: Blood cholesterol (mmoles/litre) SURG: Coded variable 0 = no surgery 1 = surgery as part of a trial 2 = surgery for symptoms within 1 year 3 = surgery for symptoms within 1 to 5 years 4 = surgery for symptoms after 5 years

What is this section trying to do? Teach us about the use of data

What is this section trying to do? Teach us about the use of data tools. Tools to help us: • explore search for important features / messages • communicate report the important features/messages Two types of tools: • visual summaries plots, graphs, charts, etc. • numerical summaries center, spread, percentages, frequencies, etc.

Types of Variables A quantitative or numeric variable measures or counts something. e. g.

Types of Variables A quantitative or numeric variable measures or counts something. e. g. height of a student, number of sisters A qualitative or categorical or nominal variable defines group membership. e. g. gender, ethnicity

Quantitative/Numeric Variables Continuous variables have no gaps between possible values. (measurements) e. g. weight,

Quantitative/Numeric Variables Continuous variables have no gaps between possible values. (measurements) e. g. weight, temperature Discrete variables have gaps between possible values. (counts) e. g. number of brothers Variables with few repeated values are treated as continuous. Variables with many repeated values are treated as discrete.

Qualitative (Categorical) Variables A categorical or nominal variable is one that has no order.

Qualitative (Categorical) Variables A categorical or nominal variable is one that has no order. e. g. ethnicity, gender An ordinal variable is one where the categories can be ordered. e. g. income group (low, middle, high); age group (young, old) Likert scale, e. g. (1 = strong disagree, …, 5 = strongly agree)

Types of Variables Quantitative Qualitative (measurements and counts) (define groups) Categorical/Nominal Ordinal Continuous Discrete

Types of Variables Quantitative Qualitative (measurements and counts) (define groups) Categorical/Nominal Ordinal Continuous Discrete (no idea of order) (fall in natural (few repeated (many repeated order) values)

Complete Data Table on Male Heart Attack Patients ID 390 279 391 202 69

Complete Data Table on Male Heart Attack Patients ID 390 279 391 202 69 310 392 311 393 70 203 394 204 280 55 79 205 206 312 80 281 207 282 396 208 209 283 210 397 211 398 284 399 285 71 286 212 400 287 81 813 68 288 407 a EJEC 72 52 62 50 50 27 60 72 60 63 29 48 59 50 53 17 37 45 43 60 38 57 44 51 63 49 48 58 42 68 43 67 52 63 54 37 51 42 66 54 39 63 30 59 67 SYSVOL 36 74 52 165 47 124 86 37 65 52 117 69 54 67 65 184 88 106 85 59 103 53 68 53 58 81 58 71 92 50 146 43 70 73 62 93 65 95 49 66 144 52 219 39 39 DIAVOL 131 155 137 329 95 170 215 132 163 140 164 133 135 138 221 140 193 150 149 168 124 121 109 157 112 167 159 156 259 130 146 195 133 148 133 163 144 145 237 141 314 94 117 OCCLU 0 37 33 33 0 77 7 40 0 0 50 0 30 37 0 57 37 33 0 7 47 0 27 0 0 13 0 27 0 0 47 0 0 27 33 47 43 40 10 7 13 0 33 0 0 STEN 0 63 47 30 100 23 50 10 40 10 0 27 13 63 33 13 47 43 50 37 43 57 60 77 73 13 0 0 0 100 33 70 23 0 7 10 50 40 87 47 45 0 73 TIME 143 16 143 143 40 9 142 142 142 141 140 5 118 140 23 139 100 140 139 139 72 138 139 138 3 138 137 136 109 65 136 43 76 135 53 OUTCOME 0 0 2 0 0 5 0 0 0 0 1 5 0 1 0 0 0 0 3 1 0 1 AGE 49 54 56 42 46 57 51 56 45 46 48 54 39 49 58 50 58 47 51 43 55 58 55 41 51 49 56 45 57 51 56 49 47 36 38 59 54 57 52 47 39 48 53 47 57 SMOKE 2 2 2 2 2 1 2 2 2 2 1 1 2 2 2 2 1 1 2 BETA 2 2 2 1 2 2 1 2 1 2 2 2 1 2 2 2 CHOLa 59 68 52 39 74 NA 58 75 72 90 72 NA NA 86 49 70 NA 38 61 56 62 93 63 45 60 60 57 46 58 NA 70 NA NA 61 NA NA 55 62 56 NA NA 63 62 SURG 0 1 0 0 1 2 0 0 0 0 2 0 1 0 1 4 0 0 0 0 3 0 0 0 4 0 0 3 0 0 0 2

Heart Attack Data in JMP

Heart Attack Data in JMP

Variables ID: A patient identifier instead of a name. EJEC: Ejection fraction, % of

Variables ID: A patient identifier instead of a name. EJEC: Ejection fraction, % of blood ejected from left ventricle in one beat. SYSVOL: End-systolic volume, a measure of the size of the heart. DIAVOL: End-diastolic volume. OCCLU: Occlusion score (% of myocardium of the left ventricle supplied by arteries that are totally blocked). Classify each variable according to its type.

Variables Classify each variable by type STEN: Stenosis score (% supplied by arteries that

Variables Classify each variable by type STEN: Stenosis score (% supplied by arteries that are significantly narrowed but not completely blocked). TIME: Time in months from when patient was admitted until OUTCOME: Coded variable 0 = alive at last follow up 1 = sudden cardiac death 2 = death within 30 days of heart attack 3 = death from heart failure 4 = death during / after coronary surgery 5 = non-cardiac death

Variables SMOKE: Classify each variable by type. Coded variable 1 = patient continued to

Variables SMOKE: Classify each variable by type. Coded variable 1 = patient continued to smoke 2 = patient did not continue smoking BETA: Coded variable 1 = patient took beta blockers 2 = patient did not take beta blockers AGE: Patient’s age at admission (years) CHOL: Blood cholesterol (mmoles/litre) SURG: Coded variable 0 = no surgery 1 = surgery as part of a trial 2 = surgery for symptoms within 1 year 3 = surgery for symptoms within 1 to 5 years 4 = surgery for symptoms after 5 years

Data Types in JMP

Data Types in JMP

Reporting Findings in Tables 1. Don’t try to do too much in the table.

Reporting Findings in Tables 1. Don’t try to do too much in the table. Model tables off of published research. 2. Use white space effectively. 3. Make sure tables and text refer to each other, however you do not need to write everything in table as text. If you interpret one or two key findings in a table, the reader should be able to handle the rest. 4. Use some aspect of the data to order and group rows/columns in table, e. g. size, chronology, or to show similarity or invite comparisons.

Reporting Findings in Tables Example: Exercise 3 Grove Comparisons between the Intervention group and

Reporting Findings in Tables Example: Exercise 3 Grove Comparisons between the Intervention group and the Control group are the focus here. The P column contains p -values from an appropriate test comparing the two groups on the given variables.

Reporting Findings in Tables 5. If appropriate, frame the table with summary statistics in

Reporting Findings in Tables 5. If appropriate, frame the table with summary statistics in rows and columns to provide a standard of comparison. 6. It is useful to round numbers in a table to one or two decimal places.

Example 2 The three tables below show six-monthly circulation figures for six weekly magazines

Example 2 The three tables below show six-monthly circulation figures for six weekly magazines in New Zealand. Table 1: Circulation of Weekly Magazines New Idea Listener Woman’s Day Woman’s Weekly Time TV Guide Jan 1 to Jun 30, 1999 67, 070 90, 521 165, 914 126, 640 38, 136 241, 356 Jul 1 to Dec 31, 1998 63, 444 90, 018 162, 182 126, 486 38, 236 248, 786 Jan 1 to Jun 30, 1998 59, 039 92, 786 175, 002 129, 920 38, 635 258, 806 We want to compare circulation figures between magazines. It is easier to make circulation comparisons when the circulation data are in columns.

Example 1 Table 2: Circulation of Weekly Magazines Jan 1 to Jun 30, 1999

Example 1 Table 2: Circulation of Weekly Magazines Jan 1 to Jun 30, 1999 Jul 1 to Dec 31, 1998 Jan 1 to Jun 30, 1998 New Idea 67, 070 63, 444 59, 039 Listener 90, 521 90, 018 92, 786 Woman’s Day 165, 914 162, 182 175, 002 Woman’s Weekly 126, 640 126, 486 129, 920 Time 38, 136 38, 236 38, 635 TV Guide 241, 356 248, 786 258, 806 Numbers need to be rounded. Magazines need to be ordered by circulation.

Example 1 Table 3: Circulation of Weekly Magazines (in thousands) Jan 1 to Jun

Example 1 Table 3: Circulation of Weekly Magazines (in thousands) Jan 1 to Jun 30, 1998 Jul 1 to Dec 31, 1998 Jan 1 to Jun 30, 1999 Average TV Guide 259 241 250 Woman’s Day 175 162 166 168 Woman’s Weekly 130 126 127 128 Listener 93 90 91 91 New Idea 59 63 67 63 Time 39 38 38 38 Row averages allow comparisons between the most recent circulation data and the average for the magazine.

Example 1 Table 3: Circulation of Weekly Magazines (in thousands) Jan 1 to Jun

Example 1 Table 3: Circulation of Weekly Magazines (in thousands) Jan 1 to Jun 30, 1998 Jul 1 to Dec 31, 1998 Jan 1 to Jun 30, 1999 Average TV Guide 259 241 250 Woman’s Day 175 162 166 168 Woman’s Weekly 130 126 127 128 Listener 93 90 91 91 New Idea 59 63 67 63 Time 39 38 38 38 Average 126 122 Column averages allow comparisons between the circulation data and the average for the 6 magazines for the time period.

Example 1 Table 3: Circulation of Weekly Magazines (in thousands) Jan 1 to Jun

Example 1 Table 3: Circulation of Weekly Magazines (in thousands) Jan 1 to Jun 30, 1998 Jul 1 to Dec 31, 1998 Jan 1 to Jun 30, 1999 Average TV Guide 259 241 250 Woman’s Day 175 162 166 168 Woman’s Weekly 130 126 127 128 Listener 93 90 91 91 New Idea 59 63 67 63 Time 39 38 38 38 Average 126 122 Verbal Summary: During 1998 and the first-half of 1999 the TV Guide had the highest circulation for weekly magazines in New Zealand.

Univariate Analyses Variable type dictates how we display and summarize the distribution. For nominal

Univariate Analyses Variable type dictates how we display and summarize the distribution. For nominal or ordinal data the notion of distribution is typically the percentage of observations falling into each of the categories or ordered levels. For numeric data distribution refers shape of the distribution, central tendency or “average”, and variability or spread.

Types of Variables Quantitative Qualitative (measurements and counts) (define groups) Continuous (few repeated values)

Types of Variables Quantitative Qualitative (measurements and counts) (define groups) Continuous (few repeated values) Discrete (many repeated values) Categorical (no idea of order) Ordinal (fall in natural order)

Displays for Numeric Variables • Stem-and-Leaf Plots (simple, but outdated) • Histograms & Smooth

Displays for Numeric Variables • Stem-and-Leaf Plots (simple, but outdated) • Histograms & Smooth Density Estimates • Quantile and Outlier Boxplots

Traffic Death-Rates (per 100, 000 population) for 30 Countries 17. 4 10. 1 13.

Traffic Death-Rates (per 100, 000 population) for 30 Countries 17. 4 10. 1 13. 1 10. 3 10. 5 15. 7 20. 1 13. 0 21. 1 10. 4 14. 6 18. 6 Australia Czechoslovakia W. Germany Israel Norway Switzerland Austria Denmark Greece Japan Poland United States 19. 9 11. 6 5. 4 26. 8 25. 6 12. 1 12. 5 20. 0 17. 1 11. 3 12. 6 12. 0 Belgium Finland Hong Kuwait Portugal N. Ireland Bulgaria France Hungary Netherlands Singapore Scotland 15. 8 12. 0 15. 3 20. 1 9. 8 10. 1 Canada E. Germany Ireland New Zealand Sweden England & Wales Data for 1983, 1984 or 1985 depending on the country (prior to reunification of Germany)

Traffic Death-Rates (per 100, 000 population) for 30 Countries Units: 17 | 4 =

Traffic Death-Rates (per 100, 000 population) for 30 Countries Units: 17 | 4 = 17. 4 deaths per 100, 000 5 4 6 7 8 9 8 10 1 1 3 4 5 Collapse to 11 3 6 12 0 0 1 5 6 12 stems 13 0 1 14 6 15 3 7 8 16 17 1 4 18 6 19 9 20 0 1 1 22 23 24 25 6 26 8 Units: 1 | 7 = 17 0 5 0 0 1 2 2 1 5 5 1 6 6 7 7 1 9 2 0 0 2 2 2 6 7 deaths per 100, 000 0 1 1 3 3 1

Traffic Death-Rates (per 100, 000) for 30 Countries Stem-and-Leaf plot from JMP

Traffic Death-Rates (per 100, 000) for 30 Countries Stem-and-Leaf plot from JMP

Histograms Divide range of data into equal width class intervals and use the number

Histograms Divide range of data into equal width class intervals and use the number or percentage of observations in each class interval to determine the height of a bar centered over each interval. Traffic Death-Rates (per 100, 000) Class Intervals 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 % 6. 7 50. 0 23. 3 13. 3 6. 7

Histograms An appropriate histogram should have 5 -15 intervals. Histograms are used when the

Histograms An appropriate histogram should have 5 -15 intervals. Histograms are used when the sample size is moderate to large. Use n 50 as a guide. Strengths of histograms: Show the shape of the distribution. Show gaps, outliers, clusters, groupings.

Histograms – Example 2 Birth weights (g) of infants born to smoking and nonsmoking

Histograms – Example 2 Birth weights (g) of infants born to smoking and nonsmoking mothers Weights for Infants Born to Smokers 2557 2594 2600 2663 2665 2769 2782 2821 2906 2920 2948 2977 2992 3005 3033 3042 3076 3090 3132 3147 3203 3260 3303 3317 3321 3331 3374 3430 3444 3629 3637 3643 3651 3756 3884 3940 4238 709 1135 1790 1818 1885 1928 1936 2084 2125 2126 2187 2211 2225 2296 2353 2367 2381 2410 2414 2424 2466 2495 Weights for Infants Born to Nonsmokers 2523 2551 2622 2637 2722 2733 2750 2778 2807 2835 2836 2863 2877 2920 2977 3062 3080 3090 3104 3175 3203 3225 3232 3234 3274 3317 3374 3402 3416 3459 3460 3473 3475 3487 3544 3600 3614 3629 3651 3699 3728 3770 3799 3827 3860 3884 3912 3941 3969 3983 3997 4054 4111 4153 4167 4174 4593 4990 1021 1330 1474 1588 1701 1729 1893 1899 1928 1970 2055 2082 2100 2187 2240 2282 2301 2325 2353 2381 2395 2438 2442 2450 2495

Histograms – Example 2 Would like to compare birth weights of infants born to

Histograms – Example 2 Would like to compare birth weights of infants born to mother’s who smoked during pregnancy to those who did not. What distributional differences, if any, do you see?

Histograms – Example 3 Auckland sunshine hours, January to April, 2000 Frequency 15 10

Histograms – Example 3 Auckland sunshine hours, January to April, 2000 Frequency 15 10 5 0 0 5 10 Daily sunshine (hours) 15 No outliers or gaps. Two broad groupings (one group of days with little or no sun and another group of days with between 4 to 13 hours of sun).

Distributional Properties - Shape (a) Unimodal (b) Bimodal (d) Symmetric (e) Positively or Right

Distributional Properties - Shape (a) Unimodal (b) Bimodal (d) Symmetric (e) Positively or Right skewed (long upper tail) (g) Symmetric (h) Bimodal with gap (c) Trimodal (f) Negatively or Left skewed (long lower tail) (i) Exponential shape

Distribution Properties - Outliers • Outliers Mistakes or something interesting/unusual. Outlier (k) Outliers

Distribution Properties - Outliers • Outliers Mistakes or something interesting/unusual. Outlier (k) Outliers

Distributional Properties - Modality • Existence of more than one peak Modality (unimodal, bimodal,

Distributional Properties - Modality • Existence of more than one peak Modality (unimodal, bimodal, etc). (b) Bimodal (c) Trimodal

Distributional Properties - Skewness • Shape of the distribution Symmetry, skewness. Normal distribution (d)

Distributional Properties - Skewness • Shape of the distribution Symmetry, skewness. Normal distribution (d) Symmetric (e) Positively or right skewed (long upper tail) (f) Negatively or left skewed (long lower tail)

Distributional Properties – Central Tendency and Variability/Spread • Central values and spread What is

Distributional Properties – Central Tendency and Variability/Spread • Central values and spread What is the central value? How spread out are values about center? Typical birth weight of infants born to nonsmokers is approx. 3000 g. A majority of infants have birth weights within 500 g of what is typical.

Interpreting Stem-and-Leaf Plots and Histograms • Be suspicious of abrupt changes Spike (j) Spike

Interpreting Stem-and-Leaf Plots and Histograms • Be suspicious of abrupt changes Spike (j) Spike in pattern

Histograms – Example 4 # of Cigarettes Smoked Per Day by WSU smokers How

Histograms – Example 4 # of Cigarettes Smoked Per Day by WSU smokers How would you characterize this distribution?

Interpreting Stem-and-Leaf Plots and Histograms • Be suspicious of abrupt changes (l) Truncation plus

Interpreting Stem-and-Leaf Plots and Histograms • Be suspicious of abrupt changes (l) Truncation plus outlier

Features to look for in histograms and stem-and-leaf plots (a) Unimodal • • •

Features to look for in histograms and stem-and-leaf plots (a) Unimodal • • • Outliers Existence of more than one peak Shape of the distribution Central values and spread Be suspicious of abrupt changes (b) Bimodal (d) Symmetric Normal (e) Positively or Right Distribution skewed (long upper tail) (g) Symmetric (h) Bimodal with gap (c) Trimodal (f) Negatively or Left skewed (long lower tail) (i) Exponential shape

 • • • Features to look for in histograms and stem-and-leaf plots Outliers

• • • Features to look for in histograms and stem-and-leaf plots Outliers Existence of more than one peak Shape of the distribution Central values and spread Be suspicious of abrupt changes Spike (j) Spike in pattern Outlier (k) Outliers (l) Truncation plus outlier

Quantile and Outlier Boxplot

Quantile and Outlier Boxplot

Quantile and Outlier Boxplots Birth weights of babies born to smoking mothers = sample

Quantile and Outlier Boxplots Birth weights of babies born to smoking mothers = sample mean Outlier Q 1 Med Q 3 Width of box represents the IQR, the interquartile range, which is the range of the middle 50% of the data

Quantile and Outlier Boxplots are useful for comparing a numeric response variable across populations.

Quantile and Outlier Boxplots are useful for comparing a numeric response variable across populations.

Quantile and Outlier Boxplots • Individual box plots can show outliers and skewness. gives

Quantile and Outlier Boxplots • Individual box plots can show outliers and skewness. gives Right skewed data

Quantile and Outlier Boxplots • A wide box plot with short whiskers could be

Quantile and Outlier Boxplots • A wide box plot with short whiskers could be coming from a bimodal distribution or a very short tailed distribution. and Short whiskers give e

Quantile and Outlier Boxplots All three populations in this study have right skewed mean

Quantile and Outlier Boxplots All three populations in this study have right skewed mean NFCS scores, with extreme outliers. The Baseline group seems to have the lowest scores.

Types of Variables Quantitative Qualitative (measurements and counts) (define groups) Continuous (few repeated values)

Types of Variables Quantitative Qualitative (measurements and counts) (define groups) Continuous (few repeated values) Categorical Discrete or Ordinal (no idea of (many repeated order) values) Ordinal (fall in natural order)

Repeated and Grouped Data Repeated Data (Discrete Variables) Display Tools: Frequency table, bar graph

Repeated and Grouped Data Repeated Data (Discrete Variables) Display Tools: Frequency table, bar graph

Frequency Table Grove, Exercise 6: Katsma and Souza’s (2000) study are presented in tables

Frequency Table Grove, Exercise 6: Katsma and Souza’s (2000) study are presented in tables on pg. 36. They contain both the nurse’s opinion regarding a patient’s self-reported pain assessment on a 10 -pt. ordinal scale and what they actually reported in the patient’s chart. There were two classifications of patients: smiling and grimacing.

Frequency Table for Nurse’s Opinion of Patient’s Self-Reported Pain Score (smiling patients) Pain Assessment

Frequency Table for Nurse’s Opinion of Patient’s Self-Reported Pain Score (smiling patients) Pain Assessment Scale (xi) 0 1 2 3 4 5 6 7 8 9 10 Frequency (fi) Percentage (fi /n) x 100 7 7 5 8 10 11 5 2 31 0 0 8. 1 5. 8 9. 4 11. 6 12. 8 5. 8 2. 3 36. 1 0. 0 n = 86 100. 0 Cumulative % 8. 1 16. 2 22. 0 31. 4 43. 0 55. 8 61. 6 63. 9 100. 0

Frequency Table Has the columns: value xj each distinct value in the sample frequency

Frequency Table Has the columns: value xj each distinct value in the sample frequency fi how often each value occurs percentage (fi /n) x 100 percentage of sample with that value cumulative percentage of sample with value xi or less

Frequency Table for Nurse’s Opinion of Patient’s 100. 0% the Self-Reported Painof. Score (smiling

Frequency Table for Nurse’s Opinion of Patient’s 100. 0% the Self-Reported Painof. Score (smiling patients) nurse’s felt the Pain Assessment Frequency Cumulative % Percentage pain score was 31 of the Scale (xi) (fi /n) x 100 31 / 86 100% = at or below 86 nurses had 0 7 8. 1 36. 1% of the patient’s score. same 1 7 8. 1 opinion as 16. 2 2 agreed with 5 5. 8 patient. 22. 0 3 9. 4 31. 4 patient’s score 8 4 5 6 7 8 9 10 10 11 5 2 31 0 0 11. 6 12. 8 5. 8 2. 3 36. 1 0. 0 n = 86 100. 0 43. 0 55. 8 61. 6 63. 9 100. 0

Bar Graph Similar to histogram (for continuous data), except bars / rectangles are not

Bar Graph Similar to histogram (for continuous data), except bars / rectangles are not necessarily joined up.

Data Entered into JMP (with frequencies)

Data Entered into JMP (with frequencies)

Frequency Tables & Bar Graphs Computing Frequency tables are produced from raw data in

Frequency Tables & Bar Graphs Computing Frequency tables are produced from raw data in JMP under Analyze Distribution. Be sure to tell JMP that the frequencies have been entered and should be interpreted as such.

Bar Graph and Frequency Table in JMP

Bar Graph and Frequency Table in JMP

Types of Variables Quantitative Qualitative (measurements and counts) (define groups) Continuous (few repeated values)

Types of Variables Quantitative Qualitative (measurements and counts) (define groups) Continuous (few repeated values) Categorical Discrete or Ordinal (no idea of order) (many repeated values) Ordinal (fall in natural order)

Qualitative/Categorical/Nominal Variables Display Tools: Frequency table, bar graph Frequency Table Used in exactly the

Qualitative/Categorical/Nominal Variables Display Tools: Frequency table, bar graph Frequency Table Used in exactly the same way as for discrete variables.

Qualitative Variables: Bar Graph Order categories by size (i. e. by frequency unless there

Qualitative Variables: Bar Graph Order categories by size (i. e. by frequency unless there is some very compelling reason for some other ordering). Bar Graph for the variable SURG Percentage 60% SURG: 0: No surgery 40% 1: Surgery as part of trial 2: Surgery for symptoms in 1 year 20% 3: Surgery for symptoms within 1 to 5 years 0% 0 1 2 SURG 3 4 4: Surgery for symptoms after 5 years

Categorical/Nominal Variables: Frequency Table for the variable SURG Frequency Percentage Cumulative percentage No surgery

Categorical/Nominal Variables: Frequency Table for the variable SURG Frequency Percentage Cumulative percentage No surgery performed 0 409 66. 4 Surg. as part of trial 1 89 14. 4 80. 8 Surg. for sympt. within 1 year 2 72 11. 7 92. 5 Surg. for sympt. 1 to 5 years 3 29 4. 7 97. 2 Surg. for sympt. > 5 years 4 17 2. 8 100. 0 616 100. 0

Heart Attack Data in JMP

Heart Attack Data in JMP

Bar Graph for Surgery Variable in JMP Computing Frequency tables are produced from raw

Bar Graph for Surgery Variable in JMP Computing Frequency tables are produced from raw data in JMP under Analyze Distribution. Notice that there is no frequency column in this data table, that is because the data was entered where each row represents one subject in the study.

The Big Mac Index In 1986 The Economist started to compare prices of Big

The Big Mac Index In 1986 The Economist started to compare prices of Big Macs between countries (converted to US dollars). This provides a measure of whether the currency is undervalued or overvalued compared to the United States dollar.

The Big Mac Index Price of Big Macs ($US) 3 2. 5 USA Hong

The Big Mac Index Price of Big Macs ($US) 3 2. 5 USA Hong Kong New Zealand Singapore Taiwan France 0. 5 0 Japan 2 1. 5 1 Israel Price ($US) 4 3. 5 Country

More General Use of Bar Graphs • Excellent for relating labels to relative importance

More General Use of Bar Graphs • Excellent for relating labels to relative importance or relative size. Price of Big Macs ($US) 3 2. 5 Hong Kong New Zealand Singapore Taiwan France 0. 5 0 Japan 2 1. 5 1 Israel Price ($US) 4 3. 5 Country

More General Use of Bar Graphs • Can be used to display a quantitative

More General Use of Bar Graphs • Can be used to display a quantitative variable other than frequency (e. g. time, amount of money). Price of Big Macs ($US) 3 2. 5 Hong Kong New Zealand Singapore Taiwan France 0. 5 0 Japan 2 1. 5 1 Israel Price ($US) 4 3. 5 Country

More General Use of Bar Graphs • Where possible, order items by size. Price

More General Use of Bar Graphs • Where possible, order items by size. Price of Big Macs ($US) 3 2. 5 Hong Kong New Zealand Singapore Taiwan France 0. 5 0 Japan 2 1. 5 1 Israel Price ($US) 4 3. 5 Country

Other Forms of Graphs • Pie chart (For displaying the “measurement” of each object

Other Forms of Graphs • Pie chart (For displaying the “measurement” of each object as a proportion of the total. ) • Segmented bar graph (Same purpose as the pie chart. ) Percentages of the World's Gold Production Country 1983 1985 1987 1989 1991 48. 6 43. 8 36. 2 30. 8 28. 7 4. 4 5. 0 9. 3 13. 4 13. 9 19. 1 17. 7 16. 7 14. 4 11. 5 Australia 2. 2 3. 8 6. 7 10. 3 11. 2 Canada 5. 3 5. 7 7. 0 8. 3 China 4. 1 4. 0 4. 3 4. 0 5. 7 16. 3 20. 2 19. 7 19. 0 20. 8 S. Africa U. S. USSR Rest

Other Forms of Graphs 30% Rest 21% 20% Rest China Can. Austr. USSR (a)

Other Forms of Graphs 30% Rest 21% 20% Rest China Can. Austr. USSR (a) Bar graph Austr. 11% 100% 80% U. S. 14% USSR 11% (b) Pie chart Rest 40% China Can. Austr. USSR U. S. 20% S. Africa 60% Can. 8% U. S. 0% S. Africa 29% China 6% 10% S. Af Percentage • Pie chart (For displaying the “measurement” of each object as a proportion of the total. ) • Segmented bar graph (Same purpose as the pie chart. ) 0% (c) Segmented bar

Choosing between Types of Graphs • Bar graphs better at presenting relative sizes. Percentage

Choosing between Types of Graphs • Bar graphs better at presenting relative sizes. Percentage 10% • Pie charts do not 22% 25% communicate information 25% F A 20% as well. E B 15% C D 13% 10% 7%10% 23% 5% 0% • Perspective pie charts 25% Groupare disastrous! E • Avoid using perspective bar graphs. D A B C D E F 7% F 22% A B C 23% 13%

Some Principles of Graphical Excellence • A well-designed presentation of interesting data. A matter

Some Principles of Graphical Excellence • A well-designed presentation of interesting data. A matter of substance, of statistics, and of design. • Communicates complex ideas with clarity, precision and efficiency. • Gives the viewer the greatest number of ideas in the shortest possible time. • Tells the truth about the data. The Visual Display of Quantitative Information E. R. Tufte

Graphical Displays for Data on a Single Variable Quantitative/numeric - continuous Histogram, box plot,

Graphical Displays for Data on a Single Variable Quantitative/numeric - continuous Histogram, box plot, stem-and-leaf plot Discrete or Ordinal Frequency table, bar graph Qualitative, Categorical or Nominal Frequency table, bar graph, pie chart, or mosaic plot.