Scientific Methods 1 Scientific evaluation experimental design statistical
- Slides: 30
Scientific Methods 1 ‘Scientific evaluation, experimental design & statistical methods’ COMP 80131 Lecture 6: Statistical Methods-Significance Barry & Goran www. cs. man. ac. uk/~barry/mydocs/my. COMP 80131 3 Dec 2012 COMP 80131 -SEEDSM 12_6 1
Continuous random processes • Characterised by probability density functions (pdf) pdf(x) Uniform pdf: Prob of the random variable x lying between a and b is: 1 x ab pdf(x) m- 3 Dec 2012 1 Gaussian (Normal) pdf with mean m & std dev . m m+ 68% ab x COMP 80131 -SEEDSM 12_6 95. 5% for m 2 99. 7% for m 3 2
pdf & Histograms • • Ru = rand(10000, 1); %10000 unif samples hist(Ru, 20); Rg=randn(10000, 1); %Gaussian with m=0, std=1 hist(Rg, 20); 3 Dec 2012 COMP 80131 -SEEDSM 12_6 3
Convert histogram to estimate of pdf • Divide each column by number of samples • Then divide by width of bins. • For better approximation, increase number of bins 3 Dec 2012 COMP 80131 -SEEDSM 12_6 4
MATLAB illustration Rg = randn(100000, 1); %10000 Gaussians with m=0, std=1 width. Bin = 0. 2; X = -4 : width. Bin : 4 ; H = hist(Rg, X); % Histogram with bins centred on elements of X figure(2); bar(X, (H/100000)/width. Bin); ylabel('pdf estimate'); 0. 4 pdf estimate 0. 35 0. 3 Histogram as pdf estimate. 0. 25 0. 2 0. 15 0. 1 0. 05 0 -5 3 Dec 2012 -4 -3 -2 -1 0 1 2 3 COMP 80131 -SEEDSM 12_6 4 5 5
Gaussian (normal) pdf • Measurements {xi} of many naturally occurring phenomena tend to be normally distributed with some mean µ & stdev . • Let zi = (xi - µ)/ , • Then {zi} has standard normal pdf with mean = 0 & std = 1. • Conversely, if you generate a set of pseudo-random numbers {zi} with mean = 0 & std = 1, let xi = (zi) + µ to scale the mean & std as required. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 6
Plot true standard normal pdf Mean=0; Std=1; K = 1/( Std*sqrt(2*pi) ); X = -4*Std : width. Bin : 4*Std ; for I=1: length(X); G(I) = K * exp(-(X(I)-Mean)^2 / (2*Std^2) ); end; figure(4); plot(X, G); ylabel('pdf'); Gaussian pdf 0. 4 0. 35 0. 3 0. 25 0. 2 0. 15 0. 1 0. 05 0 -4 3 Dec 2012 -3 -2 -1 0 1 2 3 x COMP 80131 -SEEDSM 12_6 4 7
Plot Gaussian cdf X=-4: 0. 1: 4; C = normcdf(X, 0, 1); figure(1); plot(X, C); grid on; xlabel('x'); ylabel('prob that var < x'); prob that rand variable < x 1 0. 9 Cumulative density function (cdf) Probability of Gaussian variable (m=0 std=1) being < x. No formula for this. 0. 8 0. 7 Use MATLAB function: normcdf(X, m, std) 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0 -4 3 Dec 2012 -3 -2 -1 x 0 1 2 3 4 COMP 80131 -SEEDSM 12_6 8
Complementary Gaussian cdf 1 This is just 1 – normcdf(x, m, ) prob that var > x 0. 9 0. 8 It is prob of Gaussian random variable (mean= m, std= ) being > x. 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0 -4 3 Dec 2012 -3 -2 -1 0 x 1 2 3 4 COMP 80131 -SEEDSM 12_6 9
Complementary error function • Some call the complementary Gaussian cdf (m=0, =1) the ‘complementary error function’ Q(z) • But ‘erfc’ is also called this. • Q(z) = comp-Gaussian cdf = 0. 5 erfc(-z/ 2). • Used to rely on tables & graphs of Q(z). • When m 0 & 1, use Q((z-m)/ ) 3 Dec 2012 COMP 80131 -SEEDSM 12_6 10
3 Dec 2012 COMP 80131 -SEEDSM 12_6 11
Use of ‘normcdf’ function 0. 4 Gaussian pdf 0. 35 0. 3 0. 25 0. 2 D E 0. 15 0. 1 0. 05 0 -4 -3 -2 -1 0 x 1 2 D Prob of random var being between D & E is: 3 Dec 2012 COMP 80131 -SEEDSM 12_6 3 4 E 12
Tail of distribution 0. 4 Gaussian pdf 0. 35 0. 3 0. 25 0. 2 D 0. 15 0. 1 0. 05 0 -4 D -3 -2 -1 0 x 1 2 3 4 Prob of random variable being greater than D is: 3 Dec 2012 COMP 80131 -SEEDSM 12_6 13
An Engineering Question • • • Rectangular 1 v & 0 v pulses used to transmit a binary signal. Affected by additive white Gaussian noise (AWGN). Mean of noise =0 & power (variance) 2 = 0. 01. Estimate the bit-error probability. Bit-error may occur if noise adds voltage > 0. 5 v to 0 v or < -0. 5 v to 1 v. Assume same no. of 1’s & 0’s Voltage +1 +1/2 t 3 Dec 2012 COMP 80131 -SEEDSM 12_6 14
Solution prob(error) = prob(noise > 0. 5) when bit =0 + prob(noise < -0. 5) when bit =1 = 0. 5 prob(noise > 0. 5) +0. 5 prob(noise < 0. 5) = prob(noise > 0. 5) because of symmetry = 1 - normcdf(0. 5, 0, 0. 1) = 2. 9 10 -7 Or, using graph Q(z/ ) on next page, prob(error) = Q(0. 5/0. 1) = Q(5) 3 10 -7 3 Dec 2012 COMP 80131 -SEEDSM 12_6 15
Q(z/ ) / 3 Dec 2012 COMP 80131 -SEEDSM 12_6 z/ 16
Back to sampling Assume a population has true mean , & stdev . Take a sample of N measurements from it; say N=50 Calculate sample-mean m 1 & stdev s 1. Cannot expect m 1 = µ & s 1 = , exactly. Take another sample, & calculate m 2 & s 2. Repeat to obtain m 1, m 2, …, m. M & s 1, s 2, …, s. M Now have distributions for sample-mean & sample-stdev. If population is Gaussian, pdf of sample-means will be Gaussian with mean = & stdev = / N. • Can confirm by increasing M & estimating mean & stdev of sample-mean from m 1, m 2, …, m. M • What about mean & stdev of sample-variances? (later) • • 3 Dec 2012 COMP 80131 -SEEDSM 12_6 17
Significance testing • • • Assume pop-mean (‘mu’)may change. Assume we know pop-stdev & that it will not change. Assume we can only take one sample of 50 values. Calculate m 1 to decide whether µ has changed. Null Hypothesis – it has not changed. i. e. new pop-mean New = • If Null Hyp is true, pdf of sample-mean is on next slide: 3 Dec 2012 COMP 80131 -SEEDSM 12_6 18
Gaussian pdf of sample-mean 0. 4 0. 35 0. 3 0. 25 0. 2 0. 15 0. 1 0. 05 0 s 1 = / 50 -2 s 1 -s 1 +2 s 1 m 1 +4 s 1 • Assume value we got was m 1 = + 2. 5 s 1. Ø E. g. if µ=0 & =1, then m 1 = 2. 5/ 50 0. 36 • How unlikely if Null Hypothesis is true? 3 Dec 2012 COMP 80131 -SEEDSM 12_6 19
Concept of a ‘null-hypothesis’ • A null-hypothesis is an assumption that is made and then tested by a set of experiments designed to reveal that it is likely to be false, if it is false. • Testing is done by considering how probable the results are, assuming the null hypothesis is true. • If the results appear very improbable the researcher may conclude that the null-hypothesis is likely to be false. • This is usually the outcome the researcher hopes for when he or she is trying to prove that a new technique is likely to have some value. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 20
p-value • “Probability of obtaining a test result at least as extreme as the one observed, assuming that null-hypothesis is true”. • Reject null-hypothesis if the p-value is less than some value α (significance level) which is often 0. 05 or 0. 01. • When null-hypothesis is rejected, result is statistically significant. • Here p-value is 1 - normcdf(m 1, , s 1) …with s 1= / N = 1 -normcdf( +2. 5 s 1, , s 1) = 1 - normcdf(2. 5 s 1 , 0, s 1) = 0. 0062 = 1 - normcdf(2. 5 , 0, 1) = 0. 0062 • Much less than 0. 01 so reject NH at 1% confidence level. • Conclude that mean has changed. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 21
Our two assumptions • That was easy because we made 2 assumptions: population is Gaussian & pop-stdev is known to us. • Now need to eliminate these 2 assumptions. • We have some help from the Central Limit Theorem: 3 Dec 2012 COMP 80131 -SEEDSM 12_6 22
Central Limit Theorem • If samples of size N are ‘randomly’ chosen from a pop with mean & std , the pdf of their sample-means, m 1, approaches a Normal (Gaussian) pdf with mean & std / N as N is made larger & larger. • Regardless of whether population is Gaussian or not! • Previous example can be made to work for non. Gaussian pop provided N is ‘large enough’. • More on this next week. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 23
Another example • Assume we wish to find out if a technique designed to benefit users of a system is likely to have any value. • Divide users into two groups & offer proposed technique to one group, and something different to the other group. • The null-hypothesis would be that the proposed technique offers no measurable advantage over the other techniques. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 24
The testing • Look for differences between the sets of results obtained for each of the two groups. • Careful experimental design will try to eliminate differences not caused by techniques being compared. • Take a large number of users in each group & randomize the way the users are assigned to groups. • Once other differences have been eliminated as far as possible, remaining difference will hopefully be indicative of the effectiveness of the techniques being investigated. • Vital question is whether they are likely to be due to the advantages of the new technique, or the inevitable random variations that arise from the other factors. • Are the differences statistically significant? • Can employ a statistical significance to find out. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 25
Failure of the experiment • If results are not found to look improbable under the nullhypothesis, i. e. if the differences between the two groups are not statistically significant, then no conclusion can be made. • Null-hypothesis could be true, or it could still be false. • Mistake to conclude that the ‘null-hypothesis’ has been proved likely to be true in this circumstance. • It is quite possible that the results of the experiment give insufficient evidence to make any conclusions at all. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 26
Question: fair coin test Checking whether a coin is fair Suppose we obtain heads 14 times out of 20 flips. The p-value for this test result would be the probability of a fair coin landing on heads at least 14 times out of 20 flips. From binomial distribution formula( Lecture 4), this is: (20 C 14 + 20 C 15+20 C 16+20 C 17+20 C 18+20 C 19+20 C 20) / 220 = 0. 058 This is probability that a fair coin would give a result as extreme or more extreme than 14 heads out of 20. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 27
Significance test for fair coin question • Reject null-hypothesis if p-value α. • If α= 0. 05, rejection of null-hypothesis is: “at the 5% (significance) level”. • Probability of wrongly rejecting null-hypothesis (Type 1 error) will be equal to α. • This is often considered ‘sufficiently low’. • In our example, p-value = 0. 058 > 0. 05. • Observation is consistent with null-hypothesis & we cannot reject it. • Cannot conclude that coin is likely to be unfair. • But we have NOT proved that coin is likely to be fair. • 14 heads out of 20 flips can be ascribed to chance alone • It falls within the range of what could happen 95% of the time with a fair coin. 3 Dec 2012 COMP 80131 -SEEDSM 12_6 28
Questions from Lecture 2 • Analyse the ficticious exam results & comment on features. • Compute means, stdevs & vars for each subject & histograms for the distributions. • Make observations about performance in each subject & overall • Do marks support the hypothesis that people good at Music are also good at Maths? • Do they support the hypothesis that people good at English are also good at French? • Do they support the hypothesis that people good at Art are also good at Maths? • If you have access to only 50 rows of this data, investigate the same hypotheses – What conclusions could you draw, and with what degree of certainty? 3 Dec 2012 COMP 80131 -SEEDSM 12_6 29
Questions from L 4 1. A patent goes to a doctor with a bad cough & a fever. The doctor needs to decide whether he has ‘swine flu’. Let statement S = ‘has bad cough and fever’ & statement F = ‘has swine flu’. The doctor consults his medical books and finds that about 40% of patients with swine-flu have these same symptoms. Assuming that, currently, about 1% of the population is suffering from swine-flu and that currently about 5% have bad cough and fever (due to many possible causes including swine-flu), we can apply Bayes theorem to estimate the probability of this particular patient having swine-flu. 2. A doctor in another country knows form his text-books that for 40% of patients with swine-flu, the statement S, ‘has bad cough and fever’ is true. He sees many patients and comes to believe that the probability that a patient with ‘bad cough and fever’ actually has swine-flu is about 0. 1 or 10%. If there were reason to believe that, currently, about 1% of the population have a bad cough and fever, what percentage of the population is likely to be suffering from swineflu? 3 Dec 2012 COMP 80131 -SEEDSM 12_6 30
- Experimental vs non experimental
- Statistical analysis of experimental data
- Experimental vs non experimental
- Cross-sectional correlational design
- Disadvantages of experimental research
- Nonexperimental study
- Statistical methods of demand forecasting
- Statistical methods of demand forecasting
- 2 branch of statistics
- Advanced and multivariate statistical methods
- Evaluation of statistical infrequency
- Wax pattern fabrication
- Mixed methods program evaluation
- Job evaluation methods
- Hay job evaluation template
- Methods of job evaluation
- Computer architecture performance evaluation methods
- Choose the correct concept
- Evaluation methods
- Research methods in monitoring and evaluation
- Bridge hand evaluation methods
- Experimental variables
- Experimental design independent and dependent variables
- Experimental design xo1
- Quasi experimental design example
- Experimental study design
- Non experimental research topics
- Non-experimental design definition
- Quasi experiment advantages and disadvantages
- Key principles of experimental design
- Mendel's experimental design