Data Collection Study vs Experiment Observational Study Based
Data Collection
Study vs. Experiment �Observational Study �Based on data in which no manipulation of factors has been employed �Experiment �Manipulates factors to create treatments �Randomly assigns subjects to the treatments �Compare the responses of the subjects across treatment levels
Study or Experiment? Researchers have linked an increase in the incidence of breast cancer in Italy to dioxin released by an industrial accident in 1976. The study identified 981 women who lived near the site of the accident and were under age 40 at the time. Fifteen of the women had developed breast cancer at an unusually young average of 45. Medical records showed that they had heightened concentrations of dioxin in their blood and that each tenfold increase in dioxin level was associated with a doubling of the risk of breast cancer. Observational study
Study or Experiment? Is diet or exercise effective in combating insomnia? Some believe that cutting out desserts can help alleviate the problem, while others recommend exercise. Forty volunteers suffering from insomnia agreed to participate in a month-long test. Half were randomly assigned to a special no-desserts diet; the others continued desserts as usual. Half of the people in each of these groups were randomly assigned to an exercise program, while the others did not exercise. Those who ate no desserts and engaged in exercise showed the most improvement. Experiment
The Cycle of Statistics Population Parameter Sample Statistic
Principles of Experimental Design Control aspects of the experiment that we know may have an effect on the response, but that are not the factors being studied. Randomize to even out effects that we cannot control Replicate over as many subjects as possible.
Types of Sampling �Random Sample �Simple Random Sample �Stratified Random Sample �Probability Random Sample
Random Sampling Simple Random Sample (SRS) Every member of the population has an equal chance of being chosen for the sample Method Assign a random number to each individual in the sampling frame Select only those whose random numbers satisfy some rule
Simple Random Sample Example There are 80 students enrolled in an introductory Statistics course; you are to select a sample of 5 Sampling frame The roster of all students enrolled in the course Label each student 01 - 80 Use a random number generator and choose the first 5 students from the list that match the random numbers. Ignore numbers not on the list and repeats.
Stratified Random Sample �Population is divided into similar groups of individuals �These are called strata �Then a SRS is completed in each strata �These are combined for the overall sample
Probability Random Sample �A sample is chosen by chance �Each sample has a probability of being chosen �We have to know this
What is the population? What is the sample? Which random sample was used? � A company packaging snack foods maintains quality control by randomly selecting 10 cases from each day’s production and weighting the bags. Then they open on a bag from each case and inspect the contents. Population: All snack foods produced at the company Sample: 10 cases from each day’s production Random Sample: SRS
What is the population? What is the sample? Which random sample was used? �Dairy inspectors visit farms unannounced and take samples of the milk to test for contamination. If the milk is found to contain dirt, antibiotics, or other foreign matter, the milk will be destroyed and the farm re-inspected until purity is restored. Population: All milk at the dairy (in the tank) Sample: sample from the milk tank Random Sample: SRS
Terminology Factor: What is being manipulated Response: What is being measured Experimental Units: individuals on which the experiment is done Subjects: Human experimental units Treatment: specific experimental condition applied Control group: Group that receives no treatment or a placebo Placebo: A treatment known to have no effect
Analyzing Experiments Aspirin Study Factor: Aspirin Response: Number of heart attacks Subjects/Units: 1000 male volunteers Treatment : Aspirn Levels: Low dose and none (Placebo) Blinding: Patients not know which pill they are taking Control: A group will take a placebo pill Randomization: The men will be randomly assigned to either the treatment group or placebo group. Replication: Each treatment will be replicated 500 times
Displaying Data
Types of Variables �Categorical: places an individual into one of several groups or categories. �Ex: Eye color, favorite food �Quantitative: takes a range of numeric values �Ex: Height, weight, income Discrete: finite possible values EX: number of goals in soccer Continuous: infinite possible values EX: Height of males at Enloe
What kind of variable? �Gender �Telephone area code �Amount of electricity used �Zip code �Ticket sales at Mylie Cyrus concert �Number of chicken eggs hatched on Nov. 17, 2006 at 3: 00 am Categorical Quantitative (C) Categorical Quantitative (D) Does it make sense to average the values?
Every graph I ever make will always �Have a title �Axes labeled �Units identified �Legend �For categorical data
Graphs for categorical data Bar Chart Bars never touch!
Graphs for categorical data �Pie Chart *Used for comparing parts to a whole
Create a graph for the following… �A survey was conducted of 1000 individuals regarding their favorite color. The results are as follows: �Red 367 �Yellow 100 �Green 68 �Blue 159 �Purple 200 �Grey 26 �Pink 80
Data Representation of Favorite color survey 40, 00% 35, 00% 2, 60% 8, 00% 30, 00% Red 36, 70% 20, 00% Yellow Green Blue 25, 00% 20, 00% Purple Grey 15, 00% Pink 15, 90% 10, 00% 6, 80% 5, 00% 0, 00% Red Yellow Green Blue Purple Grey Pink
Graphs for Quantitative Variables �Dot Plot �Useful for small sets of data �Stem and Leaf Plot �Useful for small sets of data �More information than dot plot �Histograms �Box Plots �More about these tomorrow!
Creating a Stem and Leaf plot �Sort the data �Identify the min and max values to establish what kind of stems and leaves to use �If leaves become too long split them �Create a legend
Back to back stem and leaf plots �Used for comparing two similar sets of data �Stems are in the middle and the leaves expand to the left for one data set and to the right for the other data set
Histograms �Groups nearby values and displays frequencies National SAT scores 2007
How to construct Histograms �Determine the bin size �Divide the range into equal sections � Min of 5 bins �Create a frequency table �Draw the graph
Wake County 2008 SAT scores 1633 1590 1607 1622 1304 1394 1324 1766 1514 1412 1680 1544 1378 1662 1531 1646 1604 1472 1568 1541 1. Sort the data 2. Identify the range of the data 3. Identify a bin size that makes sense and will produce at least 5 bins
Relative Frequency Table Score Count 1300 - 1399 1400 - 1499 1500 – 1599 1600 – 1699 1700 – 1799 Use this table to help draw your histogram!
Histogram of 2008 Wake County SAT Scores 8 7 Number of Studnets 6 5 4 3 2 1 0 1300 -1399 1400 -1499 1500 -1599 SAT Scores 1600 -1699 1700 -1799
Graphs can be MISLEADING! Number of deaths in Iraq as Published by AOL news in March of 2006
Describing Data
Describing Data �Shape �Mound, symmetrical, skewed, single peak, multiple peaks �Outlier �Any observation that appears to not belong with the others �Center �The middle of the data �Spread �Min value to max value � (including or excluding outliers)
Describing Graphs (Shape) �Symmetric: If the right and left sides of the histogram are approximately mirror images �Skewed right: If the right side has outliers �Skewed left: If the left side has outliers �Bi-modal: If there are 2 peaks �Uniform: There are the same number of observations for each value
Measures of center �Median �Exact middle of a set of data �Mean �Arithmetic average of all of the observations in a data set
Ex: 1, 2, 3, 4, 5, 6, 7, 8, 9 �What is the median? � 5 �What is the mean? �What if 10 is added to the data set? �What is the median and mean?
Resistant measures �Def: A measure is resistant if it is not easily influenced by extreme observations ◦ Is the median a resistant measure? � Yes ◦ Is the mean a resistant measure? � No
Measures of spread �Standard deviation �Find this in your calculator under 1 variable stats! �Quartiles �IQR (Inter Quartile Range) �Q 3 -Q 1 � These are found in your 5 # Summary! �Range �=Max-min
5 number summary �Min �Q 1: quartile 1, median of the lower half �Median (Q 2) �Q 3: quartile 3, median of the upper half �Max
Components of a box plot � 5 number summary �Min, Q 1, Median, Q 3, Max �Outliers �Q 1 -1. 5(IQR) �Q 3+1. 5(IQR)
Min Q 1 Med Q 3 Max
Where’s the data? 25% 50%
What about outliers? Mi n Smallest obs. That is not an outlier Largest obs. That is not an outlier Q 1 Med Q 3 Max
Standard Deviation and Normal Distributions
Standard deviation �Gives a measure of how far the data varies from the mean “on average” �Is only used if the mean is the chosen measure of center �Is the standard deviation a resistant measure? �No!
Beginning pulse in class (n=23) 50 76 85 50 76 87 Min = 50 55 76 91 Q 1 = 66 60 79 96 62 80 108 Median = 79 66 80 110 Q 3 = 87 72 81 110 72 82 Max = 110
End pulse in class (n=24) 54 65 70 79 Min = 54 55 67 70 80 Q 1 = 64. 5 58 68 72 80 58 68 74 85 Median = 70 63 69 76 87 64 70 76 109 Q 3 = 77. 5 Max = 109
Outliers �Interquartile range (IQR) = Q 3 – Q 1 �An observation is an outlier if it lies 1. 5(IQR) above Q 3 or 1. 5(IQR) below Q 1 End Class Data Q 1 = 64. 5 Q 3 = 77. 5 IQR = 77. 5 – 64. 5 = 13 1. 5(13) = 19. 5 Q 1 – 1. 5(IQR) = 64. 5 – 19. 5 = 45 Q 3 + 1. 5(IQR) = 77. 5 + 19. 5 = 97
Outliers 54 67 72 55 68 74 58 68 76 85 87 109 58 69 76 63 70 79 64 70 80 65 70 80 �Any observation below 45 or above 97 will be an outlier � 109 is an outlier
What is Normal? �A bell shaped curve �Standard Normal distribution is when… �Mean=0 �Standard Deviation=1
68 -95 -99. 7 Rule �The normal curve can give us an idea of how extreme a value is based on how far away from the mean it is. 68% 95% -3 -2 -1 Standard Deviations 99. 7% mean 0 1 2 3 Standard Deviations
Homework �P. 65 # 12, 13 �Make graph (box plot for #12 and histogram for #13) �Describe the shape �Find any outliers �Find mean, and median �Find range, standard deviation and IQR
- Slides: 55