Describing data with graphics and numbers Types of

Describing data with graphics and numbers

Types of Data • Categorical Variables – also known as class variables, nominal variables • Quantitative Variables – aka numerical nariables – either continuous or discrete.

Graphing categorical variables

Ten most common causes of death in Americans between 15 and 19 years old in 1999.

Bar graphs

Graphing numerical variables

Heights of BIOL 300 students (cm) 165 170 142 173 168 155 160 165 163 152 154 165 180 173 190 165 175 165 170 156 155 163 168 177 166

Stem-and-leaf plot

Stem-and-leaf plot 19 18 17 16 15 14 0 0 0003357 0335555556888 24556 2

Frequency table Height Group 141 -150 151 -160 161 -170 171 -180 181 -190 Frequency

Frequency table Height Group Frequency 141 -150 1 151 -160 6 161 -170 15 171 -180 5 181 -190 1

Histogram

Histogram

Histogram Frequency distribution

Histogram with more data



50 th percentile (median) 90 th percentile

Associations between two categorical variables

Association between reproductive effort and avian malaria

Association between reproductive effort and avian malaria

Mosaic plot

Grouped Bar Graph

Associations between categorical and numerical variables

Multiple histograms

Associations between two numerical variables

Scatterplots

Scatterplots

Evaluating Graphics • Lie factor • Chartjunk • Efficiency

Don’t mislead with graphics

Better representation of truth

Lie Factor • Lie factor = size of effect shown in graphic size of effect in data

Lie Factor Example Effect in graphic: 2. 33/0. 08 = 29. 1 Effect in data: 6748/5844 = 1. 15 Lie factor = 29. 1 / 1. 15 = 25. 3

Chartjunk


Needless 3 D Graphics


Summary: Graphical methods for frequency distributions

Summary: Associations between variables

Great book on graphics

Describing data

Two common descriptions of data • Location (or central tendency) • Width (or spread)

Measures of location Mean Median Mode

Mean n is the size of the sample

Mean Y 1=56, Y 2=72, Y 3=18, Y 4=42

Mean Y 1=56, Y 2=72, Y 3=18, Y 4=42 = (56+72+18+42) / 4 = 47

Median • The median is the middle measurement in a set of ordered data.

The data: 18 28 24 25 36 14 34

The data: 18 28 24 25 36 14 34 can be put in order: 14 18 24 Median is 25. 25 28 34 36


Mean vs. median in politics • 2004 U. S. Economy • Republicans: times are good – Mean income increasing ~ 4% per year • Democrats: times are bad – Median family income fell • Why?



Measures of width • • Range Standard deviation Variance Coefficient of variation

Range 14 17 18 20 22 22 24 25 26 28 28 28 30 34 36

Range 14 17 18 20 22 22 24 25 26 28 28 28 30 34 36 The range is 36 -14 = 22


Population Variance

Sample variance n is the sample size

Shortcut for calculating sample variance

Standard deviation (SD) • Positive square root of the variance s is the true standard deviation s is the sample standard deviation

In class exercise Calculate the variance and standard deviation of a sample with the following data: 6, 1, 2

Answer Variance=7 Standard deviation =

Coefficient of variance (CV) CV = 100 s / .

Equal means, different variances
![Manipulating means • The mean of the sum of two variables: E[X + Y] Manipulating means • The mean of the sum of two variables: E[X + Y]](http://slidetodoc.com/presentation_image_h2/e5d38f077d81b3802cda57d437e56db2/image-66.jpg)
Manipulating means • The mean of the sum of two variables: E[X + Y] = E[X]+ E[Y] • The mean of the sum of a variable and a constant: E[X + c] = E[X]+ c • The mean of a product of a variable and a constant: E[c X] = c E[X] • The mean of a product of two variables: E[X Y] = E[X] E[Y] if and only if X and Y are independent.
![Manipulating variance • The variance of the sum of two variables: Var[X + Y] Manipulating variance • The variance of the sum of two variables: Var[X + Y]](http://slidetodoc.com/presentation_image_h2/e5d38f077d81b3802cda57d437e56db2/image-67.jpg)
Manipulating variance • The variance of the sum of two variables: Var[X + Y] = Var[X]+ Var[Y] if and only if X and Y are independent. • The variance of the sum of a variable and a constant: Var[X + c] = Var[X] • The variance of a product of a variable and a constant: Var[c X] = c 2 Var[X]

Parents’ heights Mean Variance Father Height 174. 3 71. 7 Mother Height 160. 4 58. 3 Father Height +Mother Height 334. 7 184. 9
- Slides: 68