Unit 1 Topic 1 Statistics 4 the science

  • Slides: 45
Download presentation
Unit 1 Topic 1

Unit 1 Topic 1

Statistics 4 the science of collecting, analyzing, and drawing conclusions from data

Statistics 4 the science of collecting, analyzing, and drawing conclusions from data

Descriptive statistics 4 the methods of organizing & summarizing data

Descriptive statistics 4 the methods of organizing & summarizing data

Inferential statistics 4 involves making generalizations from a sample to a population

Inferential statistics 4 involves making generalizations from a sample to a population

Population 4 The entire collection of individuals or objects about which information is desired

Population 4 The entire collection of individuals or objects about which information is desired

Sample 4 A subset of the population, selected for study in some prescribed manner

Sample 4 A subset of the population, selected for study in some prescribed manner

Variable 4 any characteristic whose value may change from one individual to another

Variable 4 any characteristic whose value may change from one individual to another

Data 4 observations on single variable or simultaneously on two or more variables

Data 4 observations on single variable or simultaneously on two or more variables

Types of variables

Types of variables

Categorical variables 4 or qualitative 4 or nominal data/variables 4 identifies basic differentiating characteristics

Categorical variables 4 or qualitative 4 or nominal data/variables 4 identifies basic differentiating characteristics of the population

Numerical variables 4 or quantitative 4 observations or measurements take on numerical values 4

Numerical variables 4 or quantitative 4 observations or measurements take on numerical values 4 makes sense to average these values 4 two types - discrete & continuous

Identify the following variables: 1. the income of adults in your city Numerical 2.

Identify the following variables: 1. the income of adults in your city Numerical 2. the color of M&M candies selected at random from a bag Categorical 3. the number of speeding tickets each student in AP Statistics has received Numerical 4. the area code of an individual 5. the birth weights of female babies born at a large hospital over the course of a year Numerical Categorical

Discrete (numerical) 4 listable set of values 4 usually counts of items

Discrete (numerical) 4 listable set of values 4 usually counts of items

Continuous (numerical) 4 data can take on any values in the domain of the

Continuous (numerical) 4 data can take on any values in the domain of the variable 4 usually measurements of something

Identify the following variables: 1. The number of suitcases lost by an airline. Discrete.

Identify the following variables: 1. The number of suitcases lost by an airline. Discrete. The number of suitcases lost must be a whole number. 2. The height of corn plants. Continuous. The height of corn plants can take on infinitely many values 3. The number of ears of corn produced. Discrete. The number of ears of corn must be a whole number. 4. The number of green M&M's in a bag Discrete. The number of green M&M's must be a whole number. 5. The time it takes for a car battery to die Continuous. The amount of time can take on infinitely many values (any decimal is possible). 6. The production of tomatoes by weight. Continuous. The weight of the tomatoes can take on infinitely many values

Classification by the number of variables 4 Univariate - data that describes a single

Classification by the number of variables 4 Univariate - data that describes a single characteristic of the population 4 Bivariate - data that describes two characteristics of the population 4 Multivariate - data that describes more than two characteristics (beyond the scope of this course

Graphs for categorical data

Graphs for categorical data

Bar Graph 4 Used for categorical data 4 Bars do not touch 4 Categorical

Bar Graph 4 Used for categorical data 4 Bars do not touch 4 Categorical variable is typically on the horizontal axis 4 To describe – comment on which occurred the most often or least often 4 May make a double bar graph or segmented bar graph for bivariate categorical data sets

Using class survey data: graph birth month graph gender & handedness

Using class survey data: graph birth month graph gender & handedness

Pie (Circle) graph 4 Used for categorical data 4 To make: – Proportion 360°

Pie (Circle) graph 4 Used for categorical data 4 To make: – Proportion 360° – Using a protractor, mark off each part 4 To describe – comment on which occurred the most often or least often

Graphs for numerical data

Graphs for numerical data

Dotplot 4 Used with numerical data (either discrete or continuous) 4 Made by putting

Dotplot 4 Used with numerical data (either discrete or continuous) 4 Made by putting dots (or X’s) on a number line 4 Can make comparative dotplots by using the same axis for multiple groups

Distribution Activity. . . Types (shapes) of Distributions: 1 - Symmetrical 2 - Uniform

Distribution Activity. . . Types (shapes) of Distributions: 1 - Symmetrical 2 - Uniform 3 -Skewed (left or right) Bimodal (multi-modal)

Types (shapes) of Distributions

Types (shapes) of Distributions

Symmetrical 4 refers to data in which both sides are (more or less) the

Symmetrical 4 refers to data in which both sides are (more or less) the same when the graph is folded vertically down the middle 4 bell-shaped is a special type – has a center mound with two sloping tails

Uniform 4 refers to data in which every class has equal or approximately equal

Uniform 4 refers to data in which every class has equal or approximately equal frequency

Skewed (left or right) 4 refers to data in which one side (tail) is

Skewed (left or right) 4 refers to data in which one side (tail) is longer than the other side 4 the direction of skewness is on the side of the longer tail

Uni-modal 4 refers to data in which one class have the largest frequency (one

Uni-modal 4 refers to data in which one class have the largest frequency (one peak)

Bimodal (multi-modal) 4 refers to data in which two (or more) classes have the

Bimodal (multi-modal) 4 refers to data in which two (or more) classes have the largest frequency & are separated by at least one other class

How to describe a numerical, univariate graph

How to describe a numerical, univariate graph

What strikes you as the most distinctive difference among the distributions of exam scores

What strikes you as the most distinctive difference among the distributions of exam scores in classes A, B, & C ?

1. Center 4 discuss where the middle of the data falls 4 three types

1. Center 4 discuss where the middle of the data falls 4 three types of central tendency –mean, median, & mode

What strikes you as the most distinctive difference among the distributions of scores in

What strikes you as the most distinctive difference among the distributions of scores in classes D, E, & F? Class

2. Spread 4 discuss how spread out the data is 4 refers to the

2. Spread 4 discuss how spread out the data is 4 refers to the variability of the data – Range, standard deviation, IQR

What strikes you as the most distinctive difference among the distributions of exam scores

What strikes you as the most distinctive difference among the distributions of exam scores in classes G, H, & I ?

3. Shape 4 refers to the overall shape of the distribution 4 symmetrical, uniform,

3. Shape 4 refers to the overall shape of the distribution 4 symmetrical, uniform, skewed, or bimodal

What strikes you as the most distinctive difference among the distributions of exam scores

What strikes you as the most distinctive difference among the distributions of exam scores in class K ? K

4. Unusual occurrences 4 outliers - value that lies away from the rest of

4. Unusual occurrences 4 outliers - value that lies away from the rest of the data 4 gaps 4 clusters 4 anything else unusual

5. In context 4 You must write your answer in reference to the specifics

5. In context 4 You must write your answer in reference to the specifics in the problem, using correct statistical vocabulary and using complete sentences!

More graphs for numerical data

More graphs for numerical data

Stemplots (stem & leaf plots) 4 Used with univariate, numerical data 4 Must have

Stemplots (stem & leaf plots) 4 Used with univariate, numerical data 4 Must have key sobethat we graph knowfor how Would a stemplot a good the to read number of pieces of gun chewed per day by numbers AP Stat students? Why or why not? 4 Can split stems when you have long list of Would a stemplot be a good graph for the leaves number of pairs of shoes owned by AP Stat 4 Can havestudents? a comparative stemplot with two Why or why not? groups

Example: The following data are price per ounce for various brands of dandruff shampoo

Example: The following data are price per ounce for various brands of dandruff shampoo at a local grocery store. 0. 32 0. 21 0. 29 0. 54 0. 17 0. 28 Can you make a stemplot with this data? 0. 36 0. 23

Example: Tobacco use in G-rated Movies Total tobacco exposure time (in seconds) for Disney

Example: Tobacco use in G-rated Movies Total tobacco exposure time (in seconds) for Disney movies: 223 176 548 37 158 51 299 37 11 165 74 9 2 6 23 206 9 Total tobacco exposure time (in seconds) for other studios’ movies: 205 162 6 1 117 5 91 155 24 55 17 Make a comparative stemplot.

Histograms 4 Used with numerical data Would a histogram be a good graph for

Histograms 4 Used with numerical data Would a histogram be a good graph for the 4 Bars touch on histograms fastest 4 Two typesspeed driven by AP Stat students? Why or why not? – Discrete • Bars are centered over discrete values – Continuous • Bars cover a class (interval) of values Would a histogram be a good graph for the 4 For comparative histograms – use two separate number of pieces of gun chewed per day by graphs same scale axis APwith Statthe students? Whyon or the whyhorizontal not?

Cumulative Relative Frequency Plot (Ogive) 4. . . is used to answer questions about

Cumulative Relative Frequency Plot (Ogive) 4. . . is used to answer questions about percentiles. 4 Percentiles are the percent of individuals that are at or below a certain value. 4 Quartiles are located every 25% of the data. The first quartile (Q 1) is the 25 th percentile, while third quartile (Q 3) is the 75 th percentile. What is the special name for Q 2? 4 Interquartile Range (IQR) is the range of the middle half (50%) of the data. IQR = Q 3 – Q 1