Sections 1 1 1 2 Statistical Thinking Definitions


















































- Slides: 50
Sections 1 -1 & 1 -2 Statistical Thinking
Definitions Statistics is the collection and numerical analysis of information
Definitions Variable the characteristic being measured or observed Data the information that has been collected Note: the word data is actually plural
Definitions Population the whole group you want to make conclusions about Census collection of data from the entire population Sample a smaller subgroup of the population Statistics collection of data from a sample Example: to study the population of all people in the U. S. , we may pick a sample of 1000 people to survey
Types of Statistics Descriptive Statistics—In descriptive statistics, we analyze data and look at trends within the group we have studied. Inferential Statistics—In inferential statistics, we use the information about the sample to make generalizations about the whole population. We will start with descriptive statistics in chapters 2 and 3, and then transition to inferential statistics after that.
Interpreting Statistics To properly interpret statistical information, one must consider • Context of the data • Source of the data • Sampling method • Conclusions • Statistical significance • Practical implications
Context You must think about what the numerical values stand for, and to what situation they refer. Example: A set of numerical data may refer to heart rates of males, measured in beats per minute, collected from men age 25 -40 years old
Source You must identify the source of the data to determine of it is likely to be objective or biased Example: Data from drug companies who stand to profit may be unreliable
Sampling Method The way a sample was chosen can affect the results, especially if it was not random. Example: Voluntary response samples, in which people choose whether to participate or not, or not usually valid because people with strong opinions are more likely to respond.
Conclusions When stating conclusions, use easy to understand language and do NOT speculate on the reasons for results. Example: A study found that people who drink coffee are more likely to develop cancer. The report should not speculate as to why this is without data to back it up. Do not assume that the coffee causes the cancer, but also do not speculate as to what else might be happening.
Statistical Significance This refers to how clearly the data indicate a particular conclusion. Could it have happened by chance? Example: A study finds that 56% of those surveyed prefer bananas over apples. Is 56% enough to assume that most people prefer bananas? Probably not. Even if most people prefer apples, I could easily have chosen a sample with slightly more banana eaters by chance.
Practical Implications You should also think about why the results of a study really matter. Who cares? Example: A study that correlates coffee drinking to a higher risk of cancer has practical significance in terms of health consequences. This matters to both doctors and patients.
Section 1 -3 Types of Data
There will be two types of data that we will discuss in this class: Qualitative and Quantitative
Definitions Qualitative (or categorical) data can be separated into different categories that are not numerical Example: gender, color, brands of shoes (Nike, Adidas, Skechers), etc. Note: Social Security Numbers and Zip Codes are qualitative data. Yes, they are numbers, but technically they are just labels for people and cities.
Definitions Quantitative data numerical information Examples: weights, heights, number of cars on the street, etc. Parameter—numerical measurement referring to the entire population Statistic—numerical measurement referring to a sample of the population
Definitions Discrete numerical data where the number of possible values is either a finite number or a ‘countable’ number of possible values. 0, 1, 2, 3, . . . Example: The number of eggs that hens lay, the number of cars on the street, the cost of college textbooks, etc. Usually whole numbers or money.
Definitions Continuous numerical data from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps. 2 3 Example: The amount of milk that a cow produces is 2. 343115 gallons per day, the weight of the supermodel is 124. 567 lbs, etc. Distance, Weight, and Height are ALWAYS continuous.
The Exception It seems like money should be continuous because it uses decimals: $1. 25 But actually, it is considered discrete because you can’t go more than 2 decimal places. Think of it as counting whole pennies. **Money values are discrete
Levels of Measurement Another way to classify data is to use levels of measurement. These classifications refer to the types of numerical calculating that can be done with them.
Definitions Nominal level of measurement used for data that consist of names, labels, or categories, and cannot be put in increasing order Example: survey responses of yes/no/undecided, eye color
Definitions Ordinal level of measurement used for data that may be arranged in some order, but differences between data values either cannot be determined or are meaningless Example: Course grades A, B, C, D, or F
Definitions Interval level of measurement used for data that can be put in order, and differences are meaningful. However, there is no natural zero starting point (where none of the quantity is present) Example: Years 1000, 2000, 1776, and 1492. Subtracting tells you something—the number of years that have passed. But there is no natural starting point at 0, because you also have years before that (B. C. )
Definitions Ratio level of measurement used for numerical data tht also have the natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences and ratios are meaningful. Example: Prices of college textbooks ($0 represents no cost)
Summary Levels of Measurement Nominal - categories only Ordinal - categories with some order, but you can’t calculate differences Interval - differences but no natural starting point where none is present Ratio - differences and a natural starting point so you can calculate ratios
Section 1 -4 Critical Thinking
Common Misuses of Statistics Bad Samples – Some samples are biased. You have to have a good sample! Small Samples – Samples of only a few people can be very biased. Misleading Conclusions – Making misleading conclusions or confusing correlation with causation. The following slides discuss cases of these. See the book for more.
Definitions Voluntary response sample a voluntary survey, in which people decide whether or not to respond In this case, valid conclusions can be made only about the specific group of people who agree to participate. This is an example of a bad sample. Example: Most restaurants have a comment card that customers can fill out. This is an example of voluntary response sample. Typically only the customers who are extremely happy or extremely upset volunteer to fill these out. All other customers ignore them.
Distorted Percentages • Incorrectly calculated percentages • Incorrect phrasing of increases and decreases – Example: The number of people wearing seat belts has doubled, so the researcher says it has increased by 200%. But actually, doubling is increasing by 100%, so increasing by 200% actually means the amount TRIPLED.
Loaded Questions Generic: “Should special interest groups have the right to assemble? ” Specific: “Should groups such as PETA have the right to protest? ” Mentioning a specific group in the question can evoke a gut reaction, one way or the other, that sometimes changes the answer a person would have otherwise given. Always be careful how you word things! It can make a huge difference.
Misleading Conclusions Think about that study which found that people who drink coffee are more likely to develop cancer. This does not necessarily mean that coffee causes cancer. (It actually turns out that there are other factors which were not considered that were causing the cancer. ) There is a correlation between coffee drinking and cancer. Do not confuse this with causation.
Section 1 -5 Collecting Sample Data
Data Collection • There are many different ways to collect data, some better than others. • Sometimes the best method depends on what you are studying • The most common methods for collecting data are observational studies and experiments.
Definitions Observational Study observing or measuring specific characteristics without changing anything Example: Observing a school classroom to see how children act at different age levels Example: Conducting a survey about people’s current opinions
Types of Observational Studies • Cross-sectional: collect data all at one time – Example: Observing several different classrooms to see what children at different age levels act like • Retrospective: look back at old information – Example: Looking through old school records to learn about students of different age levels • Longitudinal, or Prospective: follow a group, collecting data several times as time progresses – Example: Following a group of students throughout childhood, observing them once a year for 10 years
Definitions Experiment apply some treatment and then observe its effects on the subjects (change something, and see what happens) Example: give a group of people a new drug to see if it helps their symptoms
Types of Experiments • Blind: subject doesn’t know whether they are getting the treatment or a placebo – Example: Some subjects are given headache medicine, some are given a sugar pill to control for the placebo effect (when subjects think they feel better but are receiving no treatment) • Double-blind: neither the researcher nor the subject knows who is getting the treatment vs. placebo – Example: Neither the doctor nor the patient is told who was given the actual headache medicine until after the study, to keep the doctor from unintentionally acting differently with the placebo group
Choosing Samples Both observational studies and experiments usually study a sample of the population. It is important to choose your sample carefully, so that it will represent the population.
Key Concepts If I want to know how students feel about the cafeteria on Parkland’s campus, I am NOT going to ask only three students and make conclusions about the whole campus. I would instead have to get a “good” sample of several students throughout the whole campus, and then make some conclusions. Note: “Good” means I wouldn’t ask all males, or I wouldn’t ask just English majors, I wouldn’t ask just the students in the cafeteria…I would try to get a variety!
Key Concepts Sample data must be collected in an appropriate way, such as through a process of random selection. If sample data are not collected randomly, some sort of bias or pattern may unintentionally skew your data.
Key Concepts The whole point of statistics is to make conclusions about a large group based on information collected from a smaller portion of that group, the sample. It is actually somewhat surprising how small a sample can be, and still accurately represent the population. We will deal with exact sizes later in the semester. The key idea now is that to accurately represent the population, a sample must be chosen in the proper way. There are several methods that can be chosen.
6 Types of Sampling:
Simple Random Sampling Every member of the population has an equal chance of being chosen, and every possible sample of the same size n has the same chance of being chosen Example: Drawing names out of a hat
Random Sampling Similar to simple random sampling, except that it allows subjects to be grouped. Example: You have a group of people, each of which has brought one sibling. You write the names of the pairs of siblings on pieces of paper, and then choose a pair of siblings by drawing out of a hat. Note: Simple random sampling is a special case of random sampling. Simple random has the extra requirement that subjects have an equal chance of being paired with every other subject.
Systematic Sampling Select some starting point and then select every kth element in the population Example: choose every 10 th person in the phone book
Convenience Sampling use results that are easy to get Example: stand in front of your house and talk to the people who happen to walk by Is this type of sampling good? Absolutely Not
Stratified Sampling divide the population into categories, and then pick a random sample from each category Example: pick 20 men and 20 women
Cluster Sampling divide the population into sections and randomly select sections, then interview EVERYONE in the selected sections Example: divide the state into counties, and randomly pick 5 counties. Survey everyone in those 5 counties Note: With stratified sampling, you look at a few from each group. With cluster sampling, you look at entire clusters.
Examples 1. In conducting research for the Boston evening news, a reporter for NBC interviews 15 people as they leave IRS audits. 2. A psychologist at New York University surveys all students from each of 20 randomly selected classes. 3. The Dutchess County Commissioner of Jurors obtains a list of 42, 763 car owners and constructs a pool of jurors by selecting every 100 th name on the list. 4. A marketing expert for MTV is planning a survey in which 500 people will be randomly selected from each age group of 10 – 19, 20 – 29, and so on. 5. An instructor writes the name of each student on a dart board and then throws darts blindfolded at the board to select students.
Solutions 1. In conducting research for the Boston evening news, a reporter for NBC interviews 15 people as they leave IRS audits. Convenience 2. A psychologist at New York University surveys all students from each of 20 randomly selected classes. Cluster 3. The Dutchess County Commissioner of Jurors obtains a list of 42, 763 car owners and constructs a pool of jurors by selecting every 100 th name on the list. Systematic 4. A marketing expert for MTV is planning a survey in which 500 people will be randomly selected from each age group of 10 – 19, 20 – 29, and so on. Stratified 5. An instructor writes the name of each student on a dart board and then throws darts blindfolded at the board to select students. Simple Random