EPI546 Block I Lecture 2 Descriptive Statistics Michael

EPI-546 Block I Lecture 2 – Descriptive Statistics Michael Brown MD, MSc Professor Epidemiology and Emergency Medicine Credit to Michael P. Collins, MD, MS Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 1

Objectives - Concepts n n n Classification of data Distributions of variables Measures of central tendency and dispersion Criteria for abnormality Sampling Regression to the mean Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 2

Objectives - Skills n n n Distinguish and apply the forms of data types. Define mean, median, and mode and locate on a skewed distribution chart. Apply the concept of the standard deviation to specific circumstances. Explain why a strategy for sampling is needed. Recognize the phenomenon of regression to the mean when it occurs or is described. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 3

Clinical Measurement – 2 kinds of data n Categorical n Interval Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 4

Distinction Interval = “the interval between successive values is equal, throughout the scale” Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 5

Clinical Measurement – subtypes of data n Categorical n n n Nominal Ordinal Interval n n Discrete Continuous Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 6

Nominal data: no order n n n Alive vs. dead Male vs. female Rabies vs. no rabies Blood group O, A, B, AB Resident of Michigan, Ohio, Indiana… Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 7

Ordinal scale: natural order, but not interval n n 1 st vs. 2 nd vs. 3 rd degree burns Pain scale for migraine headache: n n n None, mild, moderate, severe Glasgow Coma Score (3 -15) Stage of cancer spread – 0 through 4 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 8

Clinical Measurement – 2 kinds of data n Categorical n n n Nominal Ordinal Interval n n Discrete Continuous Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 9

Discrete Interval variables: on a “number line” n n Number of live births Number of sexual partners Diarrheal stools per day Vision – 20/? 1 2 3 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 10

Continuous variables: n n Blood pressure Weight, or Body Mass Index Random blood sugar IQ Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 11

Interval: Continuous vs. Discrete n n No variable is perfectly continuous – e. g. you never see a BP of 152. 47 mm. Hg It’s a matter of degree – lots of possible values within the range clinically possible = continuous Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 12

Recording data n n Sometimes the variable is intrinsically one type or another – but, frequently it is the observer who decides how a variable will be measured and reported Consider cigarette smoking: Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 13

Continuous variable n Underlying (nearly) continuous variable – cigarettes/day n n 32, 63, 2, … However, this level of detail may not be necessary or desirable. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 14

Discrete interval variable n Packs per day (probably rounded off to the nearest whole number) n n 2, 1, 0 Cruder - but maybe good enough and more reliably reported Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 15

Ordinal categorical variable n n Non-smoker vs. light smoker vs. heavy smoker. May further collapse the pack/day variable. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 16

Nominal categorical variable n Non-smoker vs. former smoker vs. current smoker. n n No obvious order here, just named categories Ever-smoker vs. never-smoker. n Dichotomous outcome Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 17

So, the form of the variable is often decided by the investigator, not by nature In fact, the normal vs. abnormal distinction is generally a matter of taking a much richer measure and making it dichotomous. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 18

Quick Quiz Slide n n n What kind of a variable is religion? – Protestant, Catholic, Islamic, Judaism. . . What kind is Body Mass Index (weight divided by height 2)? What is alcohol intake if classed as none, < 2 drinks/day, and > 2 drinks/day? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 19

First question when meeting with statistician: 1. Define the type of data (continuous, ordinal, categorical, etc. ) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 20

A Few Examples of Statistical Tests Test Comparison Principal Assumptions Student's t test Means of two groups Continuous variable, normally distributed, equal variance Wilcoxon rank sum Medians of two groups Continuous variable Chi-square Proportions Categorical variable, more than 5 patients in any particular "cell" Fisher's exact Proportions Categorical variable Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 21

Objectives - Concepts n n n Classification of data Distributions of variables Measures of central tendency and dispersion Criteria for abnormality Sampling Regression to the mean Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 22

Distributions of continuous variables n n A way to display the individual – to – individual variation in some clinical measure. Consider the example in Fletcher using PSA levels: Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 23

Clinical Epidemiology: The Essentials, 3 rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 24

F r e q u e n c y x Variable www. msu. edu/user/sw/statrev/images/normal 01. gif Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 25

Clinical Epidemiology: The Essentials, 3 rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 26

The “nicest” distribution Is the normal, or Gaussian, distribution – the “bell-shaped curve”. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 27

If we want to summarize a frequency distribution, there are two major aspects to include: n Central tendency n Dispersion Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 28

Principles of Epidemiology, 2 nd edition. CDC. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 29

Principles of Epidemiology, 2 nd edition. CDC. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 30

Measures of Central Tendency: n n n Mean Median Mode Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 31

Consider this data: Parity (how many babies have you had? ) among 19 women: 0, 2, 0, 0, 1, 3, 1, 4, 1, 8, 2, 2, 0, 1, 3, 5, 1, 7, 2 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 32

Mean (Arithmetic) n n Add up all the values and divide by N 43 / 19 = 2. 26 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 33

Median n The middle value n Must first sort the data and put in order: n 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 7, 8 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 34

Mode n The most common value n 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 7, 8 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 35

In a normal distribution, all three are equal Parametric statistical methods assume a distribution with known shape (i. e. normal or Gaussian distribution) Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 36

F r e q u e n c y x Variable Dr. Michael Brown

Quick Quiz Slide n If the mode is “ 100” and the mean is “ 80” – what can you tell me about the median? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 38

mean mode F r e q u e n c y x Variable 80

Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 40

Dispersion n n Standard Deviation - most common measure used for normal or near normal distributions. Defined by a statistical formula, but remember that: n n The mean +/- one SD contains about 2/3 of the observations. the mean +/- 2 SD’s includes about 95% of the observations. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 41

Clinical Epidemiology: The Essentials, 3 rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 42

M J Campbell, Statistics at Square One, 9 th Ed, 1997. Dr. Michael Brown

So, how about this definition of “abnormal” for total serum cholesterol: A value higher than the mean + 1 S. D. ? n How many people would fall beyond that cutoff? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 44

Rose, G: The Strategy of Preventive Medicine; Oxford Press, 1998. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 45

So what’s the “best” definition of abnormality? n Fletcher lists three: n Being unusual n n Sick n n Greater than 2 SD from mean Observation regularly associated with disease Treatable n Consider abnormal only if treatment of the condition represented by the measurement leads to improved outcome Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 46

Miura et al, Archives Int Med 2001; 161: 1504. Dr. Michael Brown © Epidemiology

If you were to design a study to define an abnormal DBP for adult females in the US, how would you do it? n Measure DBP in every adult female in the US? n Then define abnormal as above 2 SD from mean? Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 48

Sampling n n Impossible to measure the BP of everyone, so must take measurements of a representative sample of subjects. Random sample n n May miss important subgroup (ethnicity for example) May need to obtain a larger sample from these important subgroups and select subjects at random within subgroup Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 49

Clinical Epidemiology: The Essentials, 3 rd Ed, by Fletcher RH, Fletcher SW, 2005. Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 50

Hanna C, Greenes D. How Much Tachycardia in Infants Can Be Attributed to Fever? Ann Emerg Med June 2004 Dr. Michael Brown © Epidemiology Dept. , Michigan State Univ. 51