Inferential Statistics Paf 203 Data Analysis and Modeling

  • Slides: 77
Download presentation
Inferential Statistics Paf 203 Data Analysis and Modeling for Public Affairs

Inferential Statistics Paf 203 Data Analysis and Modeling for Public Affairs

REVIEW l l l l What is statistics? What is the difference between a

REVIEW l l l l What is statistics? What is the difference between a population and a sample? What is a parameter? A statistic? What are the measures of central tendency? What are the measures of dispersion? What is descriptive statistics What is inferential statistics?

Statistics Quotations Statistics is like a bikini, what it reveals is suggestive, what it

Statistics Quotations Statistics is like a bikini, what it reveals is suggestive, what it conceals is vital. (Aaron Levenstein) "Statistics is like a miniskirt, it covers up essentials but gives you the ideas. ” (A Paris banker ) There are three kinds of lies – lies, damned lies and statistics. – Mark Twain

Review The word statistics can be viewed in two contexts Singular sense Plural sense

Review The word statistics can be viewed in two contexts Singular sense Plural sense Statistics as a science Statistics as actual number derived from the data

Review l Statistics is the science of designing studies, gathering data, and then classifying,

Review l Statistics is the science of designing studies, gathering data, and then classifying, summarizing, interpreting, and presenting these data to explain, make inferences, and support the decisions that are reached. points out four stages in a statistical investigation, namely: 1) Collection of data 2) Presentation of data 3) Analysis of data 4) Interpretation of data (to draw valid conclusion) l

Review l l Population- is the complete collection of measurements, objects, or individuals under

Review l l Population- is the complete collection of measurements, objects, or individuals under study. A sample- is a portion or subset taken from the population. A parameter is a number that describes a population characteristic. A statistic is a number that describes a sample characteristic.

Two Broad Categories of Statistics Descriptive Statistics Inferential Statistics

Two Broad Categories of Statistics Descriptive Statistics Inferential Statistics

Descriptive Statistics Used to describe a mass of data in a clear, concise and

Descriptive Statistics Used to describe a mass of data in a clear, concise and informative way l Deals with the methods of organizing, summarizing, and presenting data l

Example The National Statistics Office (NSO) presented the Philippine population by age group and

Example The National Statistics Office (NSO) presented the Philippine population by age group and gender using a graph.

Inferential Statistics Concerned with making generalizations (drawing conclusions) about the characteristics of a larger

Inferential Statistics Concerned with making generalizations (drawing conclusions) about the characteristics of a larger set (population) where only a part (sample) is examined 1. 10

Inferential Statistics Larger Set (N units/observations) Smaller Set (n units/observations) Inferences and Generalization s

Inferential Statistics Larger Set (N units/observations) Smaller Set (n units/observations) Inferences and Generalization s 1. 11

Example A new milk formulation designed to improve the psychomotor development of infants was

Example A new milk formulation designed to improve the psychomotor development of infants was tested on randomly selected infants. Based on the results, it was concluded that the new milk formulation is effective in improving the psychomotor development of infants. I. 12

METHODS OF DRAWING CONCLUSIONS Deductive Method ü It draws conclusions from general to specific.

METHODS OF DRAWING CONCLUSIONS Deductive Method ü It draws conclusions from general to specific. ü It assumes that any part of the population will bear the observed characteristics of the population. ü Hence, conclusions are stated with certainty. 4. A. 13 Population Inference Sample

ILLUSTRATION Statement 1: All UPLB students are intelligent. Statement 2: Pedro is a UPLB

ILLUSTRATION Statement 1: All UPLB students are intelligent. Statement 2: Pedro is a UPLB student. Conclusion: Pedro is intelligent.

METHODS OF DRAWING CONCLUSIONS Inductive Method ü It draws conclusions from specific to general.

METHODS OF DRAWING CONCLUSIONS Inductive Method ü It draws conclusions from specific to general. ü It assumes that the characteristics observed from a part of the population is likely to hold true for the whole population. ü Hence, conclusions are subject to uncertainty. Sample Inference Population

ILLUSTRATION Statement 1: Pedro is a UPLB student. Statement 2: Pedro is intelligent. Conclusion:

ILLUSTRATION Statement 1: Pedro is a UPLB student. Statement 2: Pedro is intelligent. Conclusion: All UPLB students are intelligent. 4. A. 16

INFERENTIAL STATISTICS It makes use of the inductive method of drawing conclusions. Sample Sampling

INFERENTIAL STATISTICS It makes use of the inductive method of drawing conclusions. Sample Sampling Process Data Inferences/Generalization (Subject to Uncertainty) Population 4. A. 17

The Necessary Steps of Inferential Statistics l l l Specify the question to be

The Necessary Steps of Inferential Statistics l l l Specify the question to be answered and identify the population of interest Describe how to select the sample Select the sample and analyze the sample information using descriptive statistics Use the information on step 3 to make an inference about the population Determine the reliability of the inference.

What do we need to know? l l Random variable and its behavior Sampling

What do we need to know? l l Random variable and its behavior Sampling is where a sample is drawn from much larger body of measurements called population.

Some definitions Inferential Statistics is generalizing a particular characteristic of a population by generating

Some definitions Inferential Statistics is generalizing a particular characteristic of a population by generating the information from a sample. l. A variable is a characteristic that changes or varies over time or different individuals or objects under consideration. l. A population is the set of all measurements of interest. l. A sample is a subset of measurements selected from a population of interest.

(cont. ) Some definitions l A sampling distribution is a theoretical, probabilistic distribution of

(cont. ) Some definitions l A sampling distribution is a theoretical, probabilistic distribution of all possible sample outcomes (with constant sample size n), for the statistic that is to be generalized to the population. l Collecting data It is possible to gather data from an entire population , this is called a census. Usually, data gathered from experiments and observations come form samples.

(cont. ) Some definitions l A sample should be representative of the population but

(cont. ) Some definitions l A sample should be representative of the population but there are many ways that samples can be selected. It is helpful to categorize them into non probability and probability samples. l A nonprobability sample is one where judgment of the experimenter, the method in which data are collected, or other factors could affect the results of the sample.

Variable l A variable is a characteristic that changes or varies over time or

Variable l A variable is a characteristic that changes or varies over time or different individuals or objects under consideration. l. Broad Classification of Variables: QUANTITATIVE l l DISCRETE CONTINUOUS QUALITATIVE

Types of Variable Qualitative l assumes values that are not numerical but can be

Types of Variable Qualitative l assumes values that are not numerical but can be categorized l categories may be identified by either nonnumerical descriptions or by numeric codes

Types of Variable Quantitative l indicates the quantity or amount of a characteristic l

Types of Variable Quantitative l indicates the quantity or amount of a characteristic l data are always numeric l can be discrete or continuous

Types of Quantitative Variables Discrete – variable with a finite or countable number of

Types of Quantitative Variables Discrete – variable with a finite or countable number of possible values 2. A. 26 Continuous – variable that assumes any value in a given interval or continuum of values

RANDOM VARIABLE - a rule or function that assigns exactly one real number to

RANDOM VARIABLE - a rule or function that assigns exactly one real number to every possible outcome of a random experiment Note: The domain of the function is the sample space S and the co-domain is the set of real numbers, . S - 0

TYPES OF RANDOM VARIABLES Discrete random variables take on a set of distinct possible

TYPES OF RANDOM VARIABLES Discrete random variables take on a set of distinct possible values or a countably infinite number of possible values.

TYPES OF RANDOM VARIABLES Continuous random variables take on any value within a specified

TYPES OF RANDOM VARIABLES Continuous random variables take on any value within a specified interval or continuum of values. 3. C. 29

Basic Concepts in Sampling • SAMPLING – the process of selecting a sample •

Basic Concepts in Sampling • SAMPLING – the process of selecting a sample • PARAMETER – descriptive measure of the population • STATISTIC – descriptive measure of the sample • INFERENTIAL STATISTICS – concerned with making generalizations about parameters using statistics

WHY DO WE USE SAMPLES? 1. Reduced Cost 2. Greater Speed or Timeliness 3.

WHY DO WE USE SAMPLES? 1. Reduced Cost 2. Greater Speed or Timeliness 3. Greater Efficiency and Accuracy 4. Greater Scope 5. Convenience 6. Necessity 7. Ethical Considerations

TWO TYPES OF SAMPLES • Probability Samples • Non-Probability Samples

TWO TYPES OF SAMPLES • Probability Samples • Non-Probability Samples

Non-Probability Samples • Samples are obtained haphazardly, selected purposively or are taken as volunteers.

Non-Probability Samples • Samples are obtained haphazardly, selected purposively or are taken as volunteers. • The probabilities of selection are unknown. • They should not be used for statistical inference. • They result from the use of judgment sampling, accidental sampling, purposively sampling, and the like.

Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples.

Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples. 1. Judgment samples- sample selection is sometimes based on the opinion of one or more persons who feel qualified to identify items for a sample as being characteristic of the population. Example: a political campaign manager intuitively picks certain voting districts as reliable places to measure the public’s opinion of her candidate. The poll that is taken form this district is a judgement sample based on the campaign manager’s expertise and experience.

Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples.

Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples. 2. Voluntary sample- sometimes questions are posed to the public by publishing them in print media or by broadcasting them over the radio or the television. Dialing one number indicates yes, while the other indicates no. Such polls produce voluntary samples and attract only those who are interested in the subject matter.

Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples.

Three commonly employed non probability samples include judgment samples, voluntary samples, and convenience samples. 3. Convenience samples- Often people want to take an easy sample. For example, a surveyor will stand in one location and ask passersby their question or questions. Or the student working on a project will ask the entire class to fill out a survey questionnaire. These samples taken at the convenience of the surveyor is called a convenience sample.

Probability Samples … • Samples are obtained using some objective chance mechanism, thus involving

Probability Samples … • Samples are obtained using some objective chance mechanism, thus involving randomization. • They require the use of a complete listing of the elements of the universe called the sampling frame. • The probabilities of selection are known. • They are generally referred to as random samples. • They allow drawing of valid generalizations about the universe/population.

Probability Sample l A probability sample is one of which the chance of selection

Probability Sample l A probability sample is one of which the chance of selection of each item in the population is known before the sample is picked.

Types of probability samples 1. Simple random sample- is a probability sample which is

Types of probability samples 1. Simple random sample- is a probability sample which is chosen in such a way that all possible groupings of a given size have an equal chance of being picked, and if each item in the population has an equal chance of being selected.

(cont. ) Types of probability samples 2. Systematic samples- Suppose we have a list

(cont. ) Types of probability samples 2. Systematic samples- Suppose we have a list of 1000 registered voters in a community and we want to pick a probability sample of 50. We can use a random number table to pick one of the first 20 voters (1, 000/50=20) on our list. If the table gave us the number 16, then the 16 th voter on the list would be the first to be selected. We would then pick every 16 th name after this random start (the 36 th voter, 56 th voter, etc. ) to produce a systematic sample.

(cont. ) Types of probability samples 3. Stratified samples-If the population is divided into

(cont. ) Types of probability samples 3. Stratified samples-If the population is divided into relatively homogenous groups, or strata, and a sample is drawn from each group to produce an overall sample, this overall sample is known as a stratified sample. Stratified sample is usually performed when there is a large variation within the population and the researcher has some prior knowledge of the structure of the population that can be used to establish the strata. The sample results from each stratum are weighted and calculated with the sample results of other strata to provide the overall estimate.

(cont. ) Types of probability samples 4. Cluster samples- is one in which the

(cont. ) Types of probability samples 4. Cluster samples- is one in which the individual units to be sampled are actually groups or clusters of items. It is assumed that the individual items within each cluster are representative of the population. Example: consumer surveys in big cities emply cluster sampling. They divide the city into blocks, each block containing a cluster of households to be surveyed. A number of clusters are selected for the sample, and the households in the cluster are surveyed.

METHODS OF PROBABILITY SAMPLING Simple Random Sampling Stratified Random Sampling Systematic Random Sampling Cluster

METHODS OF PROBABILITY SAMPLING Simple Random Sampling Stratified Random Sampling Systematic Random Sampling Cluster Sampling

SIMPLE RANDOM SAMPLING (SRS) • Most basic method of drawing a probability sample •

SIMPLE RANDOM SAMPLING (SRS) • Most basic method of drawing a probability sample • Assigns equal probabilities of selection to each possible sample • Results to a simple random sample

TYPES OF SIMPLE RANDOM SAMPLE (SRSWOR) SRS Without Replacement (SRSWOR) – does not allow

TYPES OF SIMPLE RANDOM SAMPLE (SRSWOR) SRS Without Replacement (SRSWOR) – does not allow repetitions of selected units in the sample

TYPES OF SIMPLE RANDOM SAMPLE (SRSWR) SRS With Replacement (SRSWR) – allows repetitions of

TYPES OF SIMPLE RANDOM SAMPLE (SRSWR) SRS With Replacement (SRSWR) – allows repetitions of selected units in the sample

STRATIFIED RANDOM SAMPLING The universe is divided into L mutually exclusive sub-universes called strata.

STRATIFIED RANDOM SAMPLING The universe is divided into L mutually exclusive sub-universes called strata. Independent simple random samples are obtained from each stratum. Note:

ILLUSTRATION Stratified Random Sample

ILLUSTRATION Stratified Random Sample

Advantages of Stratification 1. It gives a better cross-section of the population. 2. It

Advantages of Stratification 1. It gives a better cross-section of the population. 2. It simplifies the administration of the survey/data gathering. 3. The nature of the population dictates some inherent stratification. 4. It allows one to draw inferences for various subdivisions of the population. 5. Generally, it increases the precision of the 4. B. 49 estimates.

SYSTEMATIC SAMPLING Adopts a skipping pattern in the selection of sample units Gives a

SYSTEMATIC SAMPLING Adopts a skipping pattern in the selection of sample units Gives a better cross-section if the listing is linear in trend but has high risk of bias if there is periodicity in the listing of units in the sampling frame Allows the simultaneous listing and selection of samples in one operation

ILLUSTRATION Systematic Sample 4. B. 51 Population Determine the sampling interval, k = N/n

ILLUSTRATION Systematic Sample 4. B. 51 Population Determine the sampling interval, k = N/n

CLUSTER SAMPLING • It considers a universe divided into N mutually exclusive sub-groups called

CLUSTER SAMPLING • It considers a universe divided into N mutually exclusive sub-groups called clusters. • A random sample of n clusters is selected and their elements are completely enumerated. • It has simpler frame requirements. • It is administratively convenient to implement.

ILLUSTRATION Population Cluster Sample

ILLUSTRATION Population Cluster Sample

What is a sampling error? l If we want to make a judgment about

What is a sampling error? l If we want to make a judgment about a population from a sample, we want those results to be as typical as the population. But this is difficult to do so, and we have to live with sampling errors. Errors can also come from coding and recoding of data. Results obtained from a biased sample are worthless.

How to determine a sample size l Use the formula where: P is the

How to determine a sample size l Use the formula where: P is the proportion of the target population that is based on prior information, Q is (1 -P) , and d is the degree of error that is defined by the investigator. l Example: n= (50%) 50%/(3/2) ²= 1, 111 or about 1200. l If N is known, then adjust to n*= n/(1+ n/N)

Concept of Hypothesis testing l The objective of hypothesis testing is to determine whether

Concept of Hypothesis testing l The objective of hypothesis testing is to determine whether or not the sample data support some belief or hypothesis about the population. In hypothesis testing, we make assumptions about the unknown parameters.

Hypothesis testing has five steps: 1. 2. 3. 4. 5. Formulating the hypothesis Selecting

Hypothesis testing has five steps: 1. 2. 3. 4. 5. Formulating the hypothesis Selecting the statistical analysis model to be used Setting the criteria for rejecting the null hypothesis Analysis Making a decision.

Formulating the hypothesis l l There are two types of hypotheses: the null hypothesis

Formulating the hypothesis l l There are two types of hypotheses: the null hypothesis (Ho) and the alternative hypothesis (Ha). The alternative hypothesis is the hypothesis that the researcher wants to prove. The purpose of the alternative hypothesis is to determine whether or not the evidence provided by the sample is enough to establish that the null hypothesis is not true. If there is enough such evidence, then we will say that there is evidence to support the alternative hypothesis.

(cont. ) Formulating the hypothesis l The alternative hypothesis is the hypothesis that the

(cont. ) Formulating the hypothesis l The alternative hypothesis is the hypothesis that the researcher wants to prove. The purpose of the alternative hypothesis is to determine whether or not the evidence provided by the sample is enough to establish that the null hypothesis is not true. If there is enough such evidence, then we will say that there is evidence to support the alternative hypothesis.

Examples of types of hypothesis a. Hypothesis concerning the value of the population mean

Examples of types of hypothesis a. Hypothesis concerning the value of the population mean b. Hypothesis concerning the value of the difference in the means of two populations. c. Hypothesis concerning the relationship of the two nominal scale intervals

There also three possible forms of alternative hypothesis: l Ha ≤ 0 or Ha

There also three possible forms of alternative hypothesis: l Ha ≤ 0 or Ha ≥ 0 l Ha > 0 l Ha < 0

1. Selecting the statistical model to be used: l Having specified the null and

1. Selecting the statistical model to be used: l Having specified the null and alternative hypothesis, we then select the appropriate test statistic or statistical model to be used. The choice of our statistic would depend on the number of factors: 1. the nature of the hypothesis problem, 2. the level of measurement used, and the assumptions of normality. l Some of the most frequent statistical tests used are: – – – the T-test the Z test Chi square test.

2. Setting the criteria for rejecting the null hypothesis. l This involves two things:

2. Setting the criteria for rejecting the null hypothesis. l This involves two things: 1. selecting a significance level and 2. determining the area of rejection. l The level of significance refers to the probability of rejecting the null hypothesis when it is true. This is called the Type I error or the error. (The Type II error or the error is accepting the null hypothesis when it is not true). The level of significance refers to the probability that we will reject the null hypothesis. We make the selection of the level of significance before we compute for the test statistic. We need to select a level of significance that we think is reasonable. The decision as to which significance level to use depends on the questions involved. Social scientists routinely accept the probability of 0. 05 for rejecting the null hypothesis. If a statistical test would lead to significant policy recommendations, then you may wish to reduce the risk of being in error and signify a significance level of 0. 01 or even. 001.

TWO TYPES OF ERROR A TYPE I ERROR is committed when we reject a

TWO TYPES OF ERROR A TYPE I ERROR is committed when we reject a true null hypothesis. A TYPE II ERROR is committed when we accept a false null hypothesis. 5. B. 64

PROBABILITY OF COMMITTING ERRORS The PROBABILITY OF A TYPE I ERROR is usually denoted

PROBABILITY OF COMMITTING ERRORS The PROBABILITY OF A TYPE I ERROR is usually denoted by a. It is also known as the level of significance of a statistical test.

PROBABILITY OF COMMITTING ERRORS The PROBABILITY OF A TYPE II ERROR is usually denoted

PROBABILITY OF COMMITTING ERRORS The PROBABILITY OF A TYPE II ERROR is usually denoted by b. 5. B. 66

DECISION MATRIX

DECISION MATRIX

Area of Rejection l l Based on the significance level we choose, we then

Area of Rejection l l Based on the significance level we choose, we then delineate our region of acceptance and region of rejection. The region of rejection is also called the critical region. Outcomes falling here mean we reject the null hypothesis. Our critical region will also depend on whether we are doing a right tailed test, a left tailed test or a two-tailed test. If our alternative hypothesis involves the > sign, we use the right tailed test. When our alternative hypothesis involves < sign, we use the left tailed test. When our alternative hypothesis involves the = sign, we will use the two tailed test.

3. Analysis l The analysis part is the process of computing for our test

3. Analysis l The analysis part is the process of computing for our test statistic based on the assumptions we made and the data we have.

4. Making a decision l In assessing the null hypothesis, we can accept the

4. Making a decision l In assessing the null hypothesis, we can accept the null hypothesis or reject it in favor of the alternative hypothesis. Our decision will be based on the value of the test statistic we obtain in the analysis stage. If the value of the test statistic is located in the critical region, we reject the null hypothesis in favor of the alternative hypothesis. Our findings may be taken as conclusive even if there is the probability that we may be in error. If the test statistic is located in the acceptance region, we accept the null hypothesis. Our findings are not conclusive. We simply do not have enough evidence to prove our alternative hypothesis.

Example: l When the judge makes the pronouncement that the defendant is “guilty”, he

Example: l When the judge makes the pronouncement that the defendant is “guilty”, he serves the sentence imposed even if there is a probability that he is not actually guilty. But when the judge hands down a verdict of not guilty, it is usually not because it has been proven beyond reasonable doubt, that he is not guilty. There is simply not enough evidence to prove that the defendant is guilty.

Example of hypothesis testing: The T distribution l Compare the academic achievement of the

Example of hypothesis testing: The T distribution l Compare the academic achievement of the foreign students and the total student population l Given: student body GPA=2. 0, variance is unknown; foreign student GPA=2. 58, s= 1. 23, n=30 l Step 1. Stating the null hypothesis: Ho: µ= 2. 00 , H 1: µ ≠ 2. 00. l Step 2. Selecting the sampling distribution and establishing the critical region-use the t statistic, and define the probability of error, Ü= 0. 01, a two tailed test with the degree of freedom= (n-1)=29. Step 1. Make assumptions-random sampling, sampling distribution is normal

(Cont. ) T distribution l l l Step 3. Critical region= +/- 2. 756

(Cont. ) T distribution l l l Step 3. Critical region= +/- 2. 756 Step 4. Computing the Test statistic: Tcomp= ×-µ/ s/ /(n-1) t= 2. 58 -2. 00/1. 23 /29 t=. 58/. 23 t= + 2. 52 Step 5 Making a decision- do not reject the Ho. The difference between the sample mean (2. 58) and the population mean (2. 0) is no greater than is expected if only random chance were operating.

Introduction to the Chi square test of independence l The Chi square ( )

Introduction to the Chi square test of independence l The Chi square ( ) test of independence is a very general test that is used to evaluate whether or not frequencies which have been empirically obtained differ significantly from those which would be expected if no relationship between the variables existed.

Chi square test of independence l Suppose we want to look at the relationship

Chi square test of independence l Suppose we want to look at the relationship between religious affiliation and political LAMPP affiliation. Suppose that, for this purpose, you selected a random sample of 100 Iglesia LAKAS ni Cristo (INC) members and 100 Jesus is Lord (JIL) members. We asked each of Total them how they voted during the last Presidential elections, and put the results in a bivariate table. INC JIL Total 80 (80%) 40 (40%) 120 20 (20%) 60 (60%) 80 100% 200

Chi square test of independence l l If we examine the percentages, we see

Chi square test of independence l l If we examine the percentages, we see that of the 100 INC members, a larger proportion (80%) voted LAMMP while of the 100 JIL members, a larger proportion (60%) voted LAKAS. It seems that there was a great tendency for INC members to vote LAMMP and JIL members to vote LAKAS. Can we base on this sample results conclude that there is a relationship between religious affiliation and political affiliation? The Chi-square test of independence is a technique for testing the level of statistical significance obtained by a bivariate relationship in a cross tabulation. It can apply to any level of measurement as this relationship can always be put in a bivariate or contingency table.

END – PART 1

END – PART 1