Victor Babes UNIVERSITY OF MEDICINE AND PHARMACY TIMISOARA

“Victor Babes” UNIVERSITY OF MEDICINE AND PHARMACY TIMISOARA DEPARTMENT OF MEDICAL INFORMATICS AND BIOPHYSICS Medical Informatics Division www. medinfo. umft. ro/dim 2007 / 2008

STATISTICAL ESTIMATION STATISTICAL TESTS (I) COURSE 4

STATISTICAL ESTIMATION

1. 1. Numerical variables - example • A STUDY ON CHILDREN SOMATIC DEVELOPMENT – N = 25 children, age 10, Timisoara, 1997 – mean X = 137 cm – standard deviation s = 5 cm • Can we extend conclusions to the entire population? • For several samples, various averages!

1. 2. GRAPHICAL REPRESENTATIONS Individual values – continuous line Sample means – dotted line

1. 3. Population characteristics • Population mean μ • Standard error of the mean

EXAMPLE • A STUDY ON CHILDREN SOMATIC DEVELOPMENT • N = 25 children, age 10, Timisoara, 1997 • mean X = 137 cm • standard deviation s = 5 cm • standard error of the mean sx = 1 cm

1. 4. LOCALIZATION OF POPULATION MEAN

1. 5. DEFINITIONS – a) STANDARD DEVIATION= • DISPERSION INDICATOR SHOWING INDIVIDUAL VALUES SPREADING AROUND SAMPLE MEAN – b) STANDARD ERROR OF THE MEAN= • DISPERSION INDICATOR SHOWING SAMPLE MEAN SPREADING AROUND POPULATION MEAN

EXERCISE • For a group of N = 36 cardiac patients we found the mean blood systolic pressure of 150 mm Hg with a standard deviation of 12 mm. – a) In which interval are there located 68% of patient systolic pressure values ? – b) In which interval can we find the mean systolic pressure with 95% probability ? – c) What percent of pacients have values above 162 ?

1. 6. Generalization • LOCATION OF POPULATION CAHARACTERISTICS • TYPES: – MEANS – PROPORTIONS – DIFFERENCES (MEANS, PROPORTIONS)

• 1. 6. a. MEAN ESTIMATION – LARGE SAMPLES N > 30 – X = NORMAL DISTRIBUTION • • (REGARDLESS INDIVIDUAL DISTRIBUTION) 68% - 1 95. 4% 2 90% - 1. 65 99% 2. 58 95% - 1. 96 99. 7% 3

• 1. 6. b. SMALL SAMPLES N < 30 – X - t DISTRIBUTION – DEGREES OF FREEDOM • 1. 6. c. PROPORTIONS

STATISTICAL TESTS

2. STATISTICAL TESTS • 2. 1. SIGNIFICANT AND NONSIGNIFICANT DIFFERENCES • a) Example: – BOYS – n = 25 – X = 137 cm – s = 5 cm – sx = 1 cm – (135, 139). . . 95% GIRLS n = 25 X = 138. 5 s=5 sx = 1 nonsignificant X = 139. 5 significant

b) DEFINITIONS • • NON-SIGNIFICANT DIFFERENCES High probability to occur by chance Sampling variability The two samples belong to the same population • SIGNIFICANT DIFFERENCES • Low probability to occur by chance • Must have another cause

2. 2. STATISTICAL HYPOTHESES • a) NULL HYPOTHESIS – H 0 : X 1 = X 2 ( not mathematical equal, but statistical!) – There are no significant differences between the two values (samples) • b) ALTERNATE HYPOTHESES – H 1 : X 1 X 2 (bilateral) – X 1 > X 2 , X 1 < X 2 (unilateral)

• 2. 3. SIGNIFICANCE THRESHOLD – a) DEFINITION: • value of probability below which we start consider significant differences – b) VALUE: • a = 0. 05 = 5 % – c) CONFIDENCE LEVEL • 1 - a = 0. 95 = 95 % • 2. 4. P COEFFICIENT – P = probability that the observed differences have occurred by chance (sampling variab. )

2. 5. DECISION • If p > 0. 05 => Non-significant differences, (N) , H 0 accepted • If p < 0. 05 => Significant differences, (S), H 0 rejected – If p < 0. 01 => Very significant differences, (V), H 0 rejected – If p < 0. 001 => Extremely significant differences, (E)

3. TESTS CHARACTERISTICS • 3. 1. ERRORS – TYPE I: H 0 = TRUE, BUT REJECTEED – TYPE II: H 0 = FALSE, BUT ACCEPTED • 3. 2. TEST CONFIDENCE = 1 - a • TEST POWER = 1 - b • inverse proportionality

• 3. 3. Parametric and nonparam. – Parametric - for normal distributed variables – Nonparametric - for other distributions • 4. CLASSES OF TESTS – SIGNIFICANCE TESTS – HOMOGENEITY T. – CONCORDANCE T. – INDEPENDANCE T. – CORRELATION COEFICIENT TESTS

-end-