Module 1 Concepts of Statistics Copyright Cengage Learning

  • Slides: 54
Download presentation
Module 1 Concepts of Statistics Copyright © Cengage Learning. All rights reserved.

Module 1 Concepts of Statistics Copyright © Cengage Learning. All rights reserved.

Section 1. 1 Branches of Statistics Copyright © Cengage Learning. All rights reserved.

Section 1. 1 Branches of Statistics Copyright © Cengage Learning. All rights reserved.

LO 1. 1 A Distinguish the difference between descriptive and inferential statistics. 3

LO 1. 1 A Distinguish the difference between descriptive and inferential statistics. 3

Key Concepts: The Branches of Statistics (1 of 3) Summary Statistics is the science

Key Concepts: The Branches of Statistics (1 of 3) Summary Statistics is the science of collecting, describing, and interpreting data. There are two branches of statistics: descriptive and inferential. This section focuses on the difference between descriptive and inferential statistics. 4

Key Concepts: The Branches of Statistics (2 of 3) You will learn about two

Key Concepts: The Branches of Statistics (2 of 3) You will learn about two branches of statistics in this class: descriptive and inferential. Figure 1 5

Key Concepts: The Branches of Statistics (3 of 3) Descriptive statistics is the set

Key Concepts: The Branches of Statistics (3 of 3) Descriptive statistics is the set of techniques for summarizing data. Inferential statistics is the set of techniques that enable us to make inferences about a population based on a sample. 6

Step-by-step Example: Online Harassment (1 of 3) Scenario In a recent study conducted by

Step-by-step Example: Online Harassment (1 of 3) Scenario In a recent study conducted by Pew Research Center, it was determined that young adults are far more likely to experience online harassment, and women are more likely to experience some of the most severe forms. It is estimated that 25% of all women age 18– 24 have been sexually harassed online, while the corresponding estimate for men is only 13%. (“Online Harassment” (Pew Research Center [2014]: 2– 10)). Determine whether the scenario is an example of descriptive or inferential statistics. 7

Step-by-step Example: Online Harassment (2 of 3) What You Need to Do Determine whether

Step-by-step Example: Online Harassment (2 of 3) What You Need to Do Determine whether the scenario is an example of descriptive or inferential statistics. Step 1 Identify the purpose of the study. There likely were several questions addressed by the Pew study. The one that concerns us is: what proportion of women age 18– 24 have been victims of online sexual harassment, and how does that compare to the proportion of men age 18– 24 who have been sexually harassed online? 8

Step-by-step Example: Online Harassment (3 of 3) Step 2 Based on the purpose, determine

Step-by-step Example: Online Harassment (3 of 3) Step 2 Based on the purpose, determine whether the study uses descriptive or inferential statistics. The purpose addresses all men and women in this age group; Pew cannot possibly interview all of them to ask about sexual harassment. This is an example of inferential statistics. The population of interest is people age 18– 24 years old. 9

Section 1. 2 Population and Sample Copyright © Cengage Learning. All rights reserved.

Section 1. 2 Population and Sample Copyright © Cengage Learning. All rights reserved.

LO 1. 2 A Given a study scenario, identify the population and sample. 11

LO 1. 2 A Given a study scenario, identify the population and sample. 11

Procedures: Given a Study Scenario, Identify the Population and Sample (1 of 2) Given

Procedures: Given a Study Scenario, Identify the Population and Sample (1 of 2) Given a Study Scenario, Identify the Population and Sample There are two steps to identifying the population and sample, given a study scenario. Step 1 Determine the whole of interest (the population). The population is the entire collection of individuals or objects that you want to learn about. The population can consist of people or things. 12

Procedures: Given a Study Scenario, Identify the Population and Sample (2 of 2) Step

Procedures: Given a Study Scenario, Identify the Population and Sample (2 of 2) Step 2 Determine the individuals from whom the data came (the sample). The sample is the part of the population that was actually studied; those from whom data were actually obtained. It is a subset of the population. 13

Step-By-Step Example: Checking Account Balances (1 of 2) Scenario A local bank is interested

Step-By-Step Example: Checking Account Balances (1 of 2) Scenario A local bank is interested in identifying the average daily balance in the checking accounts of its customers. It randomly selects 150 customers from an alphabetical list of all customers who have a checking account at the bank. What You Need to Do Identify the population of interest and sample. 14

Step-By-Step Example: Checking Account Balances (2 of 2) Step 1 Determine the whole of

Step-By-Step Example: Checking Account Balances (2 of 2) Step 1 Determine the whole of interest (the population). The population is all customers at this bank who have a checking account. This group is the population because it is the group we are interested in learning something about. Step 2 Determine the individuals from whom the data came (the sample). The sample is the 150 randomly selected customers who have a checking account because they are the group selected from the population and examined. 15

Written Example: Anti-Doping Initiative in Rio (1 of 2) Scenario Due to recent developments

Written Example: Anti-Doping Initiative in Rio (1 of 2) Scenario Due to recent developments regarding use of performanceenhancing drugs by some Russian athletes, the International Olympic Committee decided to randomly test 100 of the athletes after their arrival in Rio de Janeiro for the 2016 Summer Olympics. Given • The International Olympic Committee decided to randomly test 100 of the athletes. 16

Written Example: Anti-Doping Initiative in Rio (2 of 2) Problem Identify the population of

Written Example: Anti-Doping Initiative in Rio (2 of 2) Problem Identify the population of interest and sample. Solution • The population of interest is all athletes competing in the Rio Olympics. • The sample is the 100 randomly selected athletes who were tested. 17

Section 1. 3 Variable Types Copyright © Cengage Learning. All rights reserved.

Section 1. 3 Variable Types Copyright © Cengage Learning. All rights reserved.

Key Concepts: Categorical Versus Numerical Data and Variables (2 of 2) A variable is

Key Concepts: Categorical Versus Numerical Data and Variables (2 of 2) A variable is any characteristic whose value may change from one individual to another. • A variable is categorical (or qualitative) if the individual observations are categorical responses. The possible observations are usually expressed with words. • A variable is numerical (or quantitative) if each observation is a number. And the number has quantitative meaning. 19

Written Example: Salt Intake and High Blood Pressure (1 of 4) Scenario Salt-free diets

Written Example: Salt Intake and High Blood Pressure (1 of 4) Scenario Salt-free diets are often prescribed for people with high blood pressure. A group of 150 patients with high blood pressure volunteered to participate in a study to determine the effects of salt intake on blood pressure. Patients were randomly assigned to one of the following daily salt intake groups, with the number corresponding to the group recorded: 1. 0– 375 mg 2. 376– 750 mg 3. 751– 1125 mg 4. 1126– 1500 mg 20

Written Example: Salt Intake and High Blood Pressure (2 of 4) After the groups

Written Example: Salt Intake and High Blood Pressure (2 of 4) After the groups had followed the prescribed diets for 6 months, the reduction in blood pressure was recorded for each patient in each group, and the average reduction in blood pressure in each group was determined. Given • Patients were randomly assigned to one of the four daily salt intake groups. • Blood pressure was recorded for each patient in each group. 21

Written Example: Salt Intake and High Blood Pressure (3 of 4) Problem 1 Identify

Written Example: Salt Intake and High Blood Pressure (3 of 4) Problem 1 Identify the variable “salt intake group” as categorical or numerical. Solution The variable “salt intake group” is categorical (qualitative) because even though it has a numerical value (1 through 4), those numbers are just a convenient way of labeling the categories. 22

Written Example: Salt Intake and High Blood Pressure (4 of 4) Problem 2 Identify

Written Example: Salt Intake and High Blood Pressure (4 of 4) Problem 2 Identify the variable “blood pressure” as categorical or numerical. Solution Blood pressure is numerical (quantitative) because it has a number value. 23

Key Concepts: Discrete and Continuous Variables (1 of 2) Summary Numerical variables can be

Key Concepts: Discrete and Continuous Variables (1 of 2) Summary Numerical variables can be classified into two types— discrete and continuous—depending on their possible values. This section focuses on the difference between discrete and continuous variables. 24

Key Concepts: Discrete and Continuous Variables (2 of 2) The distinction between discrete and

Key Concepts: Discrete and Continuous Variables (2 of 2) The distinction between discrete and continuous data will be important when you select an appropriate graphical display and later on when you consider probability models. Based on this way of distinguishing data, we can define a numerical variable two ways: 1. A discrete variable is a numerical variable with possible values corresponding to isolated points on the number line. Think about counting. 2. A continuous variable is a numerical variable with possible values from an entire interval on the number line. Think about measuring. 25

Procedures (1 of 2) Distinguishing Between Discrete and Continuous Random Variables There are two

Procedures (1 of 2) Distinguishing Between Discrete and Continuous Random Variables There are two steps to distinguishing between discrete and continuous random variables. Step 1 Determine whether possible values are the result of counting or represent isolated points on a number line. If the variable meets either of these conditions, the variable is discrete. The number of students in a class (counted), and the size of a randomly selected adult woman’s shoe (5, 5 -1 / 2, 6, 6 -1 / 2, etc. ) are discrete random variables. 26

Procedures (2 of 2) Step 2 Otherwise, determine if the variable takes any value

Procedures (2 of 2) Step 2 Otherwise, determine if the variable takes any value on some interval. The height of an adult female (measured and rounded to the nearest inch, say) or the length of a randomly selected adult woman’s foot (measured and rounded to the nearest shoe size) are continuous. 27

Step-by-step Example: Corn Yield Rate in Michigan (1 of 3) Scenario Several counties in

Step-by-step Example: Corn Yield Rate in Michigan (1 of 3) Scenario Several counties in Michigan were randomly selected, and information about the yield of the sweet corn crop in each county was collected. The yield of corn per acre harvested was recorded. Corn yield is measured in bushels. What You Need to Do Determine whether the variable, yield of corn, is discrete or continuous. 28

Step-by-step Example: Corn Yield Rate in Michigan (2 of 3) Step 1 Determine whether

Step-by-step Example: Corn Yield Rate in Michigan (2 of 3) Step 1 Determine whether possible values are the result of counting or represent isolated points on a number line. One could count the number of bushels of corn harvested in each field, but there might be fractions of a bushel left in any one field; further, the yields will not be recorded for each field, but rather the data set will consist of average yield for each county. Corn yield is not discrete. If we were interested in the number of ears of corn in each acre, then we would have a discrete variable, because we count the number of ears of corn. 29

Step-by-step Example: Corn Yield Rate in Michigan (3 of 3) Step 2 Otherwise, determine

Step-by-step Example: Corn Yield Rate in Michigan (3 of 3) Step 2 Otherwise, determine if the variable takes any value on some interval. The variable, yield of corn, is a continuous variable because its possible values form an entire interval on the number line and it is a measured quantity; it is possible to have fractions of a bushel. There are not a countable number of outcomes. Further, when the total number of bushels in a county is divided by the number of acres of corn, the result could have any number of decimal places. 30

Written Example: Memory Test (1 of 2) Scenario All students who enroll in a

Written Example: Memory Test (1 of 2) Scenario All students who enroll in a memory course are given a pretest before the course begins. At the completion of the course, 20 randomly selected students are given a post-test. Scores on both tests are based on the number of correct answers for a total of 20 questions asked. Given • Scores on both tests are based on the number of correct answers out of a total of 20 questions asked. 31

Written Example: Memory Test (2 of 2) Problem Identify whether the difference in pre-

Written Example: Memory Test (2 of 2) Problem Identify whether the difference in pre- and post-scores is a discrete or continuous variable. Solution The difference in pre- and post-scores is a discrete variable. Students can answer a question either correctly or incorrectly, so the number of correct answers on either test is discrete (0, 1, 2, . . . , 20). Therefore, the score, which is determined by the difference in the number of questions answered correctly, is also discrete. The difference of two whole numbers is also a whole number. 32

Section 1. 4 Parameter Versus Statistic Copyright © Cengage Learning. All rights reserved.

Section 1. 4 Parameter Versus Statistic Copyright © Cengage Learning. All rights reserved.

Key Concepts: Parameters and Statistics (2 of 3) The statistics calculated from the sample

Key Concepts: Parameters and Statistics (2 of 3) The statistics calculated from the sample are used to answer questions about and estimate values of population characteristics, called parameters. • A parameter a number that describes the entire population. • A statistic is a number that describes a sample. Easy way to remember: • Statistics come from samples (both start with “s”). • Parameters describe populations (both start with “p”). 34

Key Concepts: Parameters and Statistics (3 of 3) Symbols: Here is a summary of

Key Concepts: Parameters and Statistics (3 of 3) Symbols: Here is a summary of symbols that are commonly used to represent parameters and statistics for various characteristics of the population and sample. Most often parameters will be represented by Greek characters, while statistics will be represented by standard Arabic characters. Note one exception in the table. 35

Written Example: Free Throws (1 of 3) Scenario Stephen Curry, who plays basketball for

Written Example: Free Throws (1 of 3) Scenario Stephen Curry, who plays basketball for the Golden State Warriors, topped the 2015– 2016 leaderboard for his free throw percentage. During that season, his free throw percentage was 90. 8%. You observe Curry’s first 10 free throws during a scheduled practice and find that he made 90%, or 9 out of 10 shots. 36

Written Example: Free Throws (2 of 3) Given • Curry made 90. 8% of

Written Example: Free Throws (2 of 3) Given • Curry made 90. 8% of free throws in the 2015– 2016 season. • Curry made 90% of the first 10 free throws in a scheduled practice. Problem Determine which value is a parameter and which is a statistic. 37

Written Example: Free Throws (3 of 3) Solution The value 90. 8% is a

Written Example: Free Throws (3 of 3) Solution The value 90. 8% is a parameter because it describes Stephen Curry’s free throw percentage for all games in which he played during the 2015– 2016 season. The value 90% is a statistic because it describes Curry’s free throw percentage for a sample of 10 free throw shots taken. 38

Section 1. 5 Types of Data Copyright © Cengage Learning. All rights reserved.

Section 1. 5 Types of Data Copyright © Cengage Learning. All rights reserved.

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (1 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (1 of 10) Summary This section focuses on identifying the level of measurement: nominal, ordinal, interval, or ratio, given a description of a variable. 40

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (2 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (2 of 10) Nominal data are categorical (qualitative) data that are neither measured nor ordered, but for which subjects are allocated to distinct categories. • The data values are usually expressed in words, such as a name or a label. • Mathematically, we can think of this level of data as equal or not equal (=, ≠). • We can distinguish only whether the data are the same or different. • Doing any math with the data makes no sense. 41

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (3 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (3 of 10) Ordinal data are categorical (qualitative) data that have a natural ordering. • The data values are usually expressed in words, such as a rating or a rank. • The order of the values is important, but the differences between the values are not known or are not meaningful. • Mathematically, this level of data has the same properties as nominal data, but with the additional property of less than or greater than (<, >). 42

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (4 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (4 of 10) Interval data are numeric data for which we know not only the order, but also the exact differences between the values. • There is no absolute zero, or natural zero starting point, where nothing exists. • Mathematically, this level of data has the same properties as ordinal data, but with the additional property of being able to add and subtract (+, −). 43

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (5 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (5 of 10) Ratio data are numeric data that have an order, for which we know the exact difference between values, and that also have an absolute zero. • Mathematically, this level of data has the same properties as interval data, but with the additional property of being able to multiply and divide (×, ÷). 44

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (6 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (6 of 10) Here are some examples of questions that yield the different levels of measurement: 1. Nominal Data: • What is your favorite food? • Which school do you attend? • What is your ZIP code? • What is your marital status? Nominal scales are typically used to code categorical responses into numeric variables. 45

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (7 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (7 of 10) 2. Ordinal Data: • On a scale of 1– 5, with 5 representing complete satisfaction, how satisfied are you with the service you received? • On a scale of 1– 10, how would you rate your level of pain, with 1 being minimal or no pain and 10 being extreme pain? • Which of the following brackets describes your annual income: ($0–$20, 000), ($20, 000–$40, 000), ($40, 000–$60, 000), ($60, 000+)? 46

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (8 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (8 of 10) • Which of the following is the size of T-shirt that you wear: small, medium, or large? Ordinal scales are typically measures of non-numeric concepts like satisfaction, discomfort, “bracket” questions, letter grades, and Likert scale (used for satisfaction or agreement questions) responses. 47

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (9 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (9 of 10) 3. Interval Data: • What is the current temperature in degrees Fahrenheit (°F)? • At which point in time did dinosaurs roam the earth? • What is your IQ? All of these questions are answered with numeric values defined on an interval: degrees, year, or IQ score. With interval data, we can add and subtract, but cannot multiply or divide. 48

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (10 of

Key Concepts: Identifying the Level of Measurement (Nominal, Ordinal, Interval, or Ratio) (10 of 10) 4. Ratio Data: • How much do you weigh? • How tall are you? • How old are you? • How many siblings do you have? These questions are measured by numeric values where zero means the absence of the characteristic you are measuring. For example, if your weight is 0 pounds, you weigh nothing. These data values can be meaningfully added, subtracted, multiplied, and divided. 49

Step-by-step Example: Do You Agree? (1 of 3) Scenario A survey posed this statement:

Step-by-step Example: Do You Agree? (1 of 3) Scenario A survey posed this statement: “Taxpayer dollars are being used responsibly on public transportation. ” The respondents were asked to select a number on a scale from 1 to 5, from the following options: 1 = strongly agree, 2 = agree, 3 = no opinion, 4 = disagree, and 5 = strongly disagree. 50

Step-by-step Example: Do You Agree? (2 of 3) What You Need to Do Determine

Step-by-step Example: Do You Agree? (2 of 3) What You Need to Do Determine whether this question would yield nominal, ordinal, interval, or ratio data Step 1 Determine if the possible values are categorical without a natural order. No, the variable does have an order—there are different degrees of agreeing. 51

Step-by-step Example: Do You Agree? (3 of 3) Step 2 If not, determine if

Step-by-step Example: Do You Agree? (3 of 3) Step 2 If not, determine if the values have a natural order but the increments are not constant. This question would yield ordinal data. The order of the values is important, but it is not known if the difference between “strongly agree” and “agree” is the same as the difference between “agree” and “no opinion. ” This is an example of Likert response data. Step 3 If not, decide if zero means the absence of the characteristic being measured. This step is not needed. We have already decided that the level of agreement is ordinal. 52

Written Example: Designer Clothes (1 of 2) Scenario An online survey asked, “Do you

Written Example: Designer Clothes (1 of 2) Scenario An online survey asked, “Do you make a point of buying designer clothes? ” Given • The question: “Do you make a point of buying designer clothes? ” 53

Written Example: Designer Clothes (2 of 2) Problem Would this question yield nominal, ordinal,

Written Example: Designer Clothes (2 of 2) Problem Would this question yield nominal, ordinal, interval, or ratio data? Solution This question would yield nominal data. Nominal data are categorical data in which responses are allocated to distinct categories. The responses to the question would be “yes” and “no, ” which are categorical data; there is no order associated with the responses. 54