Where weve been and where were going Chapters
Where we’ve been and where we’re going… Chapters 1&2 Exploratory Data Analysis Exploration. Formulating questions and hypotheses.
Where we’ve been and where we’re going… Chapters 1&2 Chapter 3 Exploratory Data Analysis Producing Data Exploration. Formulating questions and hypotheses. Acquiring knowledge and information to address the questions.
Where we’ve been and where we’re going… Chapters 1&2 Chapter 3 Chapter 4 & Beyond Exploratory Data Analysis Producing Data Statistical Inference Exploration. Formulating questions and hypotheses. Acquiring knowledge and information to address the questions. Answering the questions (with numerical methods).
Collecting Data • Arrangements for collecting data from many individuals are called designs. • Important Design Questions – How many individuals will be observed? – How are individuals selected? – How will groups be formed among the selected individuals, if pertinent to the study?
Anecdotal Evidence • We tend to rely on data that most easily comes to mind. – Unusual events or individuals – Generalizations • Anecdotal evidence is composed of haphazardly selected cases. – May not be representative of the whole. – Do not trust it!
Available Data • Sometimes it’s convenient to find good data sets that have already been collected and use it to answer our question(s). • Advantages – Less work for us! – Many sources of available data exist. • Limitations – Data wasn’t collected specifically for our purpose. – How was it collected? Biases? All lurking variables accounted for?
Sample vs. Census • Census: an attempt to observe every individual in a population. – A census is expensive and time-consuming. • Sample: observation of a selected number of individuals from a population. – Easier to implement than a census. – Must take care in how the sample is chosen!
Observational Studies • A sampling of the population is a type of observational study. – Individuals are observed, but not controlled. • No matter how carefully chosen, confounding variables might exist. • Cause-and-effect relationships cannot be established on the basis of observational studies.
Controlled Experiments • We manipulate the levels of one or more variables for an individual. • Explanatory variables are called factors. • Treatment is the manipulation of explanatory variables. • Specific values of explanatory variables are called levels.
Controlled Experiments • Treatment group(s): Groups of individuals that will experience treatment (changes in the factors of interest). • Control group: Group of individuals who will not receive any treatment. • Why is a control group important? • How should we assign individuals to groups?
Grouping Subjects • It’s hard to match treatment and control groups with respect to confounding factors, though it is sometimes attempted. • It’s often best to randomize subjects to treatment or control groups. – Removes potential biases. (A study is biased if it systematically favors certain outcomes).
Example • In one early trial of coronary bypass surgery a physician performed the surgery on a test group, 98% of whom survived at least 3 years. • Previous studies showed that 68% survived at least 3 years with conventional treatment. • A newspaper commented on the physician’s results as “spectacular”. What do you conclude?
- Slides: 12