Chapter 7 An Overview of Statistical Inference Learning

  • Slides: 21
Download presentation
Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy

Chapter 7 An Overview of Statistical Inference – Learning from Data Created by Kathy Fritz

Statistical Inference What You Can Learn from Data

Statistical Inference What You Can Learn from Data

With the increasing popularity of online dating services, the truthfulness of information in the

With the increasing popularity of online dating services, the truthfulness of information in the personal profiles by users is a topic of interest. The first two of these questions are estimation problemsmisrepresentation because they involve A study was designed to investigate of using sample. The data to learn something about a personal characteristics. researchers hoped to answer three questions: population characteristic. The proportion third question is a daters hypothesis testing 1. What of online believe they have problem because it involves if misrepresented themselves in andetermining online profile? sample data support a claim about the 2. What proportion of online daters believe that others population of online daters. frequently misrepresent themselves? 3. Are people who place a greater importance on developing a long-term, face-to-face relationship more honest in their online profiles?

Learning from Sample Data When you obtain information from a sample selected from some

Learning from Sample Data When you obtain information from a sample selected from some population, it is usually because • you want to learn something about characteristics of A hypothesis testing problem involves using the population. sample data to test a claim about a population. OR An estimation problem involves using sample data to estimate the value of a population characteristic. • you want to use sample data to decide whethere is support for some claim or statement about the population. Methods for estimation and hypothesis testing are called statistical inference methods because they involve generalizing (making an inference) from a sample to the population from which the sample was selected.

Learning from Data When There Are Two or More Populations Sometimes sample data are

Learning from Data When There Are Two or More Populations Sometimes sample data are obtained from two or more populations of interest, and the goal is to learn about differences between the populations. Consider the following example: College student spend a lot of time online, but do members of Facebook spend more time online than non-members? Data was collected from two samples of college students; one consisting of Facebook members and the other consisting This study involves generalizing from samples, andof non -members. it is a hypothesis testing problem because it One of the variables studied was the amount of time spent on involves testing a claim about the difference the Internet in a typical day. between twoconcluded groups. that there was no Based on the resulting data, the it was support for the claim that the mean time spent online for Facebook members was greater than the mean time for nonmembers.

Learning from Experimental Data Statistical inference methods are also used to learn from experiment

Learning from Experimental Data Statistical inference methods are also used to learn from experiment data. When data are obtained from an experiment, it is usually because • you want toislearn about the effect the different This a hypothesis testingofproblem because experimental conditions (treatments) on the measured it involves testing a claim (hypothesis) about response. treatment effects. OR This is an estimation problem because it • you want to determine if experiment data provide support for involves using sample data to estimate a a claim about how the effects of two or more treatments characteristic of the treatments, such as the differ. mean response for a treatment.

Do U Smoke After Txt? Researchers in New Zealand investigated whether mobile phone text

Do U Smoke After Txt? Researchers in New Zealand investigated whether mobile phone text messaging could be used to help people stop smoking? An experiment was designed to compare two treatments. Subjects for the experiment were 1705 smokers who were older than Researchers 15 years and owned a mobile phone who wanted estimated that the and proportion of to quit smoking. those who successfully quit smoking was greater by for those who received text messages. People 0. 15 in the first group received personalized text messages providing support and advice on stopping smoking. The second group was a control group, and people in this group did not receive any of these text messages. Data from each the experiment were used to After 6 weeks, person participating in the study was contacted and askedinifthe he or she had estimate the difference proportion smoked thefor previous who during had quit thoseweek. who received the text messages and those who did not.

Statistical Inference Involves Risk The risks associated with statistical inference arise because you are

Statistical Inference Involves Risk The risks associated with statistical inference arise because you are attempting to draw conclusions on the basis of data that provide partial rather than complete information. In estimation problems. . . es t a m i t s e e e t s a e r th accu – K n i S I e R b Understand t y a m hat the meth od used to produce the estimates an d accompanying measures of accuracy might mislead

Statistical Inference Involves Risk The risks associated with statistical inference arise because you are

Statistical Inference Involves Risk The risks associated with statistical inference arise because you are attempting to draw conclusions on the basis of data that provide partial rather than complete information. In hypothesis testing situations. . . e t a r cu c a in n a – on i s K u RIS concl Understand h ow likely it is that the method used to decide wh ether or not a claim is sup ported might lead to an incorrect dec ision

Variability in Data When there is variability in the population, you need to Suppose

Variability in Data When there is variability in the population, you need to Suppose we wanted estimate mean(the length of fish consider whethertothis partialthe picture sample) is in a large lake. representative We could catchof a sample of 20 fish from the population. lake. One sample may have a symmetric distribution like this. Another sample may have a This sample-to-sample variability should be considered skewed distribution like this. . . when you assess the risk associated with drawing conclusions about the population from sample data. . or like this.

Variability in Data vs. An experiment might be designed to determine if noise level

Variability in Data vs. An experiment might be designed to determine if noise level has an effect on the time required to perform a task requiring concentration. There are 20 individuals available to serve as subjects in this experiment with two treatment conditions (quiet environment and noisy environment). You must understand how differences might result from variability the response the random The response variable isin the time requiredand to complete the task. assignment treatment groupstime, in order to If noise level has NO to effect on completion the time distinguish differences created by a observed for each them of thefrom 20 subjects would be the same treatment effect. whether they are in the quiet group or the noisy group. Any observed differences in the completion times for the two treatments would NOT be due to noise level, but to person-toperson variability and the random assignment of subjects to treatments.

Selecting an Appropriate Method Four Key Questions

Selecting an Appropriate Method Four Key Questions

Four. In. Key Questions the following chapters, you will encounter different types of inference

Four. In. Key Questions the following chapters, you will encounter different types of inference problems. The answer to the following questions will lead you to a Question Typesuggested (Q): Is the question method to you use. are trying to answer an estimation problem or a hypothesis testing problem? You will choose different methods depending on the answer to this question. Study Type (S): Does the situation involve generalizing from a sample to learn about the population (an observational study or survey) OR does it involve generalizing from an experiment to learn about treatment effects? The answer to this question affects the choice of the method as well as the type of conclusion that can be drawn.

Four Key Questions Continued. . . Type of Data (T): What type of data

Four Key Questions Continued. . . Type of Data (T): What type of data will be used to answer the question? Is the data set univariate (one Identify whether these examples involve univariate variable) or bivariate (two variables)? Are the data or bivariate data. Explain your choice. categorical or numerical? Univariate versus Bivariate The stud y of dec eption in profiles online da investiga ting t ed wheth place a g er peopl reater im e who p long-ter m face-t ortance on devel o o-face r more ho elationsh ping a nest in t ip are heir onli ne profil es. he t w o h n r to lea d e m r o f r room as pe d w e y b d e u h t t s n A a TV i ps. u h o t i r g w e n g o i a n two i proport n e r d l i h for c differed

Four Key Questions Continued. . . Type of Date (T): What type of data

Four Key Questions Continued. . . Type of Date (T): What type of data will be used to answer the question? Is the data set univariate (one variable) or bivariate (two variables)? Are the data categorical or numerical? Categorical versus Numerical If you have a single variable and the data are categorical, the question of interest is probably about a population proportion. If the data are numerical, the question of interest is probably about a population mean.

Four Key Questions Continued. . . Number of Samples or Treatments (N): How many

Four Key Questions Continued. . . Number of Samples or Treatments (N): How many samples are there? OR IF the data are from an experiment, how many treatments are being compared? For situations that involve sample data, different methods are used depending on whethere are one, two, or more than two samples. Also, you may choose a different method to analyze data from an experiment with only two treatments than you would for an experiment with more than two treatments.

QSTN Think of this as the word QUESTION without the vowels. Estimation or hypothesis

QSTN Think of this as the word QUESTION without the vowels. Estimation or hypothesis Q Question Type testing? S Sample data or experiment data? T Univariate or bivariate? Categorical or numerical? Study Type of Data N Number of Samples or Treatments How many samples or treatments?

Answering Four Key Questions to Identify An Appropriate Method Q Question Type Estimation Hypothesis

Answering Four Key Questions to Identify An Appropriate Method Q Question Type Estimation Hypothesis Test S Study Type T Type of Data N Number Method to Consider You will be able to refer to. One this table in Univariate Sample z Confidence Sample 1 Categorical chapters to Interval for a Proportion the following identify an Univariate One Sample z Test for a Sample 1 appropriate method to use. Categorical Proportion Chapter 9 10 Estimation Sample Univariate Categorical 2 Two Sample z Confidence Interval for a Difference in Proportions 11 Hypothesis Test Sample Univariate Categorical 2 Two Sample z Test for a Difference in Proportions 11 Estimation Sample Univariate Numerical 1 One Sample t Confidence Interval for a Mean 12 Hypothesis Test Sample Univariate Numerical 1 One Sample t Test for a Mean 12 Estimation Sample Univariate Numerical 2 Two Sample t Confidence Interval for a Difference in Means 13 Hypothesis Test Sample Univariate Numerical 2 Two Sample t Test for a Difference in Means 13 Hypothesis Test Sample Univariate Numerical More than 2 ANOVA F Test 17 online Estimation Sample Univariate Numerical More than 2 Multiple Comparisons 17 online

A Five-Step Process for Statistical Inference Estimation Problems Hypothesis Testing Problems

A Five-Step Process for Statistical Inference Estimation Problems Hypothesis Testing Problems

C C A Five-Step Process for Estimation Problems (EMC 3) C M What is

C C A Five-Step Process for Estimation Problems (EMC 3) C M What is this step? E Estimate: Explain what population characteristic you plan to estimate M Method: Select a potential method using QSTN C Check: Check to make sure that the method is appropriate. It is important to verify that any conditions are met before proceeding. C Calculate: Sample data are used to perform any necessary calculations. C Communicate Results: This is a critical step in the process. You will answer the questions of interest, explain what you have learned from the data, and acknowledge potential risk. E Step

C C A Five-Step Process for Hypothesis Testing Problems (HMC 3) H What is

C C A Five-Step Process for Hypothesis Testing Problems (HMC 3) H What is this step? C M Step H Hypotheses: Define the hypotheses that will be tested M Method: Select a potential method using QSTN C Check: Check to make sure that the method is appropriate. It is important to verify that any conditions are met before proceeding. C Calculate: Sample data are used to perform any necessary calculations. C Communicate Results: This is a critical step in the process. You will answer the questions of interest, explain what you have learned from the data, and acknowledge potential risk.