Data collection and Statistics Evert Jan Bakker Biometris
Data collection and Statistics Evert Jan Bakker Biometris - Wageningen University Biometris Quantitative Methods brought to Life
Introduction: What is Statistics? 1. Probability calculus - theoretical and exact (Easy program: PQRS) 2. Descriptive Statistics Just describes the data. All conclusions only refer to the sample. The conclusions are ‘always correct’. Càn be convincing already. Graphical representations of the data. 3. Inference (Test of Hypothesis, Estimate Conf. Interval) Conclusions are drawn about a population (e. g. Wageningen Students) or a general phenomenon (maize yield), only using data from a limited sample. 4. Experimental design/ Sampling design Randomisation, Blocking, Special designs…/sample size Biometris Quantitative Methods brought to Life
Some study types Biometris Quantitative Methods brought to Life
Types of research aim 1 a. Description Describe a phenomenon, or a sequence of phenomena, usually to conclude about underlying mechanisms, or fit it into existing theory. 1 b. Exploration : generate new ideas Measure any variable; report any fact of interest / relationship / differences, using “any” descriptive analysis. 2. Inference (test) ‘hard evidence’ for conclusions on a population or a general phenomenon based on sample data. All (data collection & analysis) must be done according to the rules, so as not to ‘Lie with Statistics’. Biometris Quantitative Methods brought to Life
Type of response ( type of analysis) “green” Fully agree Agree . . . Biometris Quantitative Methods brought to Life
Primary Data collection I: sampling n Observational research: sampling, how? , how many? Sampling: random, stratification, subsampling, . . . Conclusion can (only) be drawn about the population from which a random sample was taken. Correlations do have to indicate cause-effect relationships n Random sampling (simple, systematic, stratified, . . . ) n Non-random sampling (Convenience sample, purposive sampling, focus groups) Biometris Quantitative Methods brought to Life
7 SAMPLING BREAKDOWN
SAMPLING……. STUDY POPULATION SAMPLE TARGET POPULATION 8
Primary data collection II: experiments n n For experimental research: design of experiment choice of exerimental units, randomisation, response(s), covariates, nr. of replications Types of designed frequently used l l n Completely randomised design Block design / covariates (at experiment start) Split plot design . . . Should be based on most relevant sources of variation Biometris Quantitative Methods brought to Life
Design principles : brief overview n Biometris Quantitative Methods brought to Life
Design principles : brief overview (2) 3 a Measure other variables that may influence the response in the analysis used as covariates 3 b In case of known other possible sources of variation: Blocking create homogeneous groups (blocks) In the analysis, block-effects can be corrected for. Total variation = Treatm effect + Error Treatm effect + Block/cov eff + Error Test on treatment effect contains a comparison of measured effect vs Error Biometris Quantitative Methods brought to Life
Inference n n n For standard designs, the data analysis follows a fixed calculation pattern, known before the experiment The analysis is done, based on a model for the data. Model = assumptions about the observations l Systematic part (how mean response depends on the treatments) l Random part: independence, Normality and equal variance (independence follows from correct randomisation) If response is quantitative (e. g. yield, blood pressure) l Qualitative factor(s) e. g. variety 2 sample t-test or ANOVA l Quantitative factor(s), e. g. amount of fertilizer, amount of rainfall linear regression l Both Analysis of Covariance, General Linear Model Biometris Quantitative Methods brought to Life
Data collection for non-inference n History is not repeatable No randomisation Case studies Summaries with tables and graphs A picture says more than 1000 words. Descriptive statistics càn be convincing as well n BEFORE collecting the data n n l n You should know how you will use the data Two examples of graphical display of data Biometris Quantitative Methods brought to Life
Age structure of 72 college entrances 70 Graphs n After figuring out what the graph says, you find out that it displays one number for each of 5 years: the % of entering students of at least 25 yrs. 68 Under 25 66 25 or older 34 32 30 28 1980 81 82 83 84 Biometris Quantitative Methods brought to Life
Edward R. Tufte. The Visual Display of Quantitative Information. Biometris Quantitative Methods brought to Life
Personal experience A. Own Ph. D experience: the advantage of knowing the analysis in advance. No ‘belief’ in result of analysis keep on analysing one year extra B. Cows in Mali, see below. C. A refereed paper. Each county in England has a value for the occurrence of a disease. Cluster analysis is done, using different methods. Each method is done with 5, 6, 7, … 20 clusters, each time yielding a quality measure. 16 quality values per method analysis : 1 -way ANOVA on quality between the methods. Biometris Quantitative Methods brought to Life
Ad. C Compare 4 clustering methods What went wrong: the ‘data’ cannot be said to be independent Biometris Quantitative Methods brought to Life
Ad B. Cows in pasture land in Mali Cow activity (walking, ruminating, eating, lying down) is observed, for 10 days, 8 h/d, 12 times/h, each time for 60 (s). 960 observations per cow Interest in: amount of time spent walking (%) = y. Variation = within-cow-variation (between moments) + variation between (means of) cows The observations were pseudo-replications. Experimental /sampling units : cows Measured units: moments; observations are not indep. n Make sure to think about the sources of variation. Biometris Quantitative Methods brought to Life
Precision: Sample size calculations n 2 Hypothetical Populations, one for each treatment. We call the population means: μ 1 and μ 2 n Parameter of interest: Δ=μ 1 - μ 2 Samples: y 1, 1, …, y 1, n 1; y 2, 1, …, y 2, n 2 Model = Assumptions: the data are outcomes of n 1 and n 2 independent drawings from N(μ 1, σ1) and N(μ 2, σ2). n n Extra assumption: σ1= σ2 = σ. Biometris Quantitative Methods brought to Life
3 (of many) possible realities Δ= 0 (no difference) Δ= Δ 1 (large difference) Δ= Δ 2 (small difference) Assumed: Normality and σ1= σ2 Biometris Quantitative Methods brought to Life
Testing: reality vs. conclusion Given a relevant Ha reality (value for Δ ), and given α (e. g. 0. 05) the power of a planned experiment can be calculated. Biometris Quantitative Methods brought to Life
Formula for sample size : confidence interval n Formula (n per sample), for a (1 -α) C. I. Error Margin ≤ M. tα/2≈ 2. 0 - 2. 2 n Precision criteria that have to be specified: 1 - α = confidence level and M = max Error Margin Notes 1) σ has to estimated 2) if α=0. 05, t=2. 0 – 2. 2. 3) if outcome for n is small (< 10) change the t-value with df = 2(n -1) and calculate again. n 4) In testing, in stead of M, we specify Δ, the minimum relevant difference and (=1 –power) Biometris Quantitative Methods brought to Life
Conclusions for quantitative research n In design phase l Think about the relevant “sources of variation” (influential factors) which of them will you include in design, which of them will you keep constant? Block design? Split plot? l Measure conditions that vary (weather, . . . ) as covariates l Measure general conditions (even if they do not vary across treatments in your experiment) n Do correct randomisation n Avoid / be aware of pseudo-replication experimental units measured unit sampling unit measured unit Biometris Quantitative Methods brought to Life
Conclusions n n n For sample size calculations (e. g. Russ Lenth) 1) know your analysis 2)specify precision : Minimum relevant difference , power (0. 8/0. 9), α (5%) 3) know error variation: (guess: range/4) Measure and store quantitative data, when possible, not binary data. Conclusions from a statistical analysis are drawn in the context of a statistical model. The correctness and the relevance of the conclusion depend on the correctness and the relevance of the model. Biometris Quantitative Methods brought to Life
Conclusions n In case of need, contact a statistician !. . . beforehand. n Always at your service : Biometris Quantitative Methods brought to Life
- Slides: 25