EPI546 Block I Lecture 2 Descriptive Statistics Michael
EPI-546 Block I Lecture 2 – Descriptive Statistics Michael Brown MD, MSc Professor Epidemiology and Emergency Medicine Credit to Michael P. Collins, MD, MS Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 1
Objectives - Concepts n n n Classification of data Distributions of variables Measures of central tendency and dispersion Criteria for abnormality Sampling Regression to the mean Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 2
Objectives - Skills n n n Distinguish and apply the forms of data types. Define mean, median, and mode and locate on a skewed distribution chart. Apply the concept of the standard deviation to specific circumstances. Explain why a strategy for sampling is needed. Recognize the phenomenon of regression to the mean when it occurs or is described. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 3
Clinical Measurement – 2 kinds of data n Categorical n Interval Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 4
Distinction Interval = “the interval between successive values is equal, throughout the scale” Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 5
Clinical Measurement – subtypes of data n Categorical n n n Nominal Ordinal Interval n n Discrete Continuous Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 6
Nominal data: no order n n n Alive vs. dead Male vs. female Rabies vs. no rabies Blood group O, A, B, AB Resident of Michigan, Ohio, Indiana… Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 7
Ordinal scale: natural order, but not interval n n 1 st vs. 2 nd vs. 3 rd degree burns Pain scale for migraine headache: n n n None, mild, moderate, severe Glasgow Coma Score (3 -15) Stage of cancer spread – 0 through 4 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 8
Clinical Measurement – 2 kinds of data n Categorical n n n Nominal Ordinal Interval n n Discrete Continuous Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 9
Discrete Interval variables: on a “number line” n n Number of live births Number of sexual partners Diarrheal stools per day Vision – 20/? 1 2 3 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 10
Continuous variables: n n Blood pressure Weight, or Body Mass Index Random blood sugar IQ Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 11
Interval: Continuous vs. Discrete n n No variable is perfectly continuous – e. g. you never see a BP of 152. 47 mm. Hg It’s a matter of degree – lots of possible values within the range clinically possible = continuous Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 12
Recording data n n Sometimes the variable is intrinsically one type or another – but, frequently it is the observer who decides how a variable will be measured and reported Consider cigarette smoking: Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 13
Continuous variable n Underlying (nearly) continuous variable – cigarettes/day n n 32, 63, 2, … However, this level of detail may not be necessary or desirable. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 14
Discrete interval variable n Packs per day (probably rounded off to the nearest whole number) n n 2, 1, 0 Cruder - but maybe good enough and more reliably reported Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 15
Ordinal categorical variable n n Non-smoker vs. light smoker vs. heavy smoker. May further collapse the pack/day variable. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 16
Nominal categorical variable n Non-smoker vs. former smoker vs. current smoker. n n No obvious order here, just named categories Ever-smoker vs. never-smoker. n Dichotomous outcome Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 17
So, the form of the variable is often decided by the investigator, not by nature In fact, the normal vs. abnormal distinction is generally a matter of taking a much richer measure and making it dichotomous. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 18
Quick Quiz Slide n n n What kind of a variable is religion? – Protestant, Catholic, Islamic, Judaism. . . What kind is Body Mass Index (weight divided by height 2)? What is alcohol intake if classed as none, < 2 drinks/day, and > 2 drinks/day? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 19
First question when meeting with statistician: 1. Define the type of data (continuous, ordinal, categorical, etc. ) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 20
A Few Examples of Statistical Tests Test Comparison Principal Assumptions Student's t test Means of two groups Continuous variable, normally distributed, equal variance Wilcoxon rank sum Medians of two groups Continuous variable Chi-square Proportions Categorical variable, more than 5 patients in any particular "cell" Fisher's exact Proportions Categorical variable Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 21
Objectives - Concepts n n n Classification of data Distributions of variables Measures of central tendency and dispersion Criteria for abnormality Sampling Regression to the mean Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 22
Distributions of continuous variables n n A way to display the individual – to – individual variation in some clinical measure. Consider the example in Fletcher using PSA levels: Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 23
Clinical Epidemiology: The Essentials, 3 rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 24
F r e q u e n c y x Variable www. msu. edu/user/sw/statrev/images/normal 01. gif Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 25
Clinical Epidemiology: The Essentials, 3 rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 26
The “nicest” distribution Is the normal, or Gaussian, distribution – the “bell-shaped curve”. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 27
If we want to summarize a frequency distribution, there are two major aspects to include: n Central tendency n Dispersion Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 28
Principles of Epidemiology, 2 nd edition. CDC. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 29
Principles of Epidemiology, 2 nd edition. CDC. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 30
Measures of Central Tendency: n n n Mean Median Mode Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 31
Consider this data: Parity (how many babies have you had? ) among 19 women: 0, 2, 0, 0, 1, 3, 1, 4, 1, 8, 2, 2, 0, 1, 3, 5, 1, 7, 2 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 32
Mean (Arithmetic) n n Add up all the values and divide by N 43 / 19 = 2. 26 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 33
Median n The middle value n Must first sort the data and put in order: n 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 7, 8 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 34
Mode n The most common value n 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 7, 8 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 35
In a normal distribution, all three are equal Parametric statistical methods assume a distribution with known shape (i. e. normal or Gaussian distribution) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 36
F r e q u e n c y x Variable Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 37
Quick Quiz Slide n If the mode is “ 100” and the mean is “ 80” – what can you tell me about the median? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 38
mean mode F r e q u e n c y x Variable 80 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 100 39
Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 40
Dispersion n n Standard Deviation - most common measure used for normal or near normal distributions. Defined by a statistical formula, but remember that: n n The mean +/- one SD contains about 2/3 of the observations. the mean +/- 2 SD’s includes about 95% of the observations. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 41
Clinical Epidemiology: The Essentials, 3 rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 42
M J Campbell, Statistics at Square One, 9 th Ed, 1997. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 43
So, how about this definition of “abnormal” for total serum cholesterol: A value higher than the mean + 1 S. D. ? n How many people would fall beyond that cutoff? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 44
Rose, G: The Strategy of Preventive Medicine; Oxford Press, 1998. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 45
So what’s the “best” definition of abnormality? n Fletcher lists three: n Being unusual n n Sick n n Greater than 2 SD from mean Observation regularly associated with disease Treatable n Consider abnormal only if treatment of the condition represented by the measurement leads to improved outcome Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 46
Miura et al, Archives Int Med 2001; 161: 1504. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 47
If you were to design a study to define an abnormal DBP for adult females in the US, how would you do it? n Measure DBP in every adult female in the US? n Then define abnormal as above 2 SD from mean? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 48
Sampling n n Impossible to measure the BP of everyone, so must take measurements of a representative sample of subjects. Random sample n n May miss important subgroup (ethnicity for example) May need to obtain a larger sample from these important subgroups and select subjects at random within subgroup Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 49
Clinical Epidemiology: The Essentials, 3 rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 50
Hanna C, Greenes D. How Much Tachycardia in Infants Can Be Attributed to Fever? Ann Emerg Med June 2004 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 51
- Slides: 51