Introduction Introduction to Sampling Types of Samples Random

  • Slides: 29
Download presentation

Introduction 引言 • • • Introduction to Sampling 抽样简介 Types of Samples: Random and

Introduction 引言 • • • Introduction to Sampling 抽样简介 Types of Samples: Random and Non-Random 样本的类型:随机和非随机 How Confident and Precise Do You Need to Be? 你需要多大的可信度和精确度? How Large a Sample Do You Need? 你需要多大的样本? Where to Find a Sampling Statistician? 如何找到抽样调查统计员? 2

Sampling 抽样 • Is it possible to collect data from the entire population? (census)

Sampling 抽样 • Is it possible to collect data from the entire population? (census) • 收集总体的数据可能吗? (普查) – If so, we can talk about what is true for the entire population – 如果可以,我们能够说出总体的真实情况 – Often we cannot (time/cost) – 经常的情况是我们不能 (时间/成本) – If not, we can use a smaller subset: a SAMPLE – 如果不能,我们可以使用一个较小的子集:样本 3

Concepts 概念 • Population • 总体 – the total set of units – 各单元构成的整体

Concepts 概念 • Population • 总体 – the total set of units – 各单元构成的整体 • Sample • 样本 – a subset of the population – 总体的一个子集 • Sampling Frame • 抽样框架 – list from which to select your sample – 一个列表,从中可以选取你要的样本 4

More Sampling Concepts 更多的抽样概念 • Sample Design • 样本设计 – methods of sampling (probability

More Sampling Concepts 更多的抽样概念 • Sample Design • 样本设计 – methods of sampling (probability or non-probability) – 抽样的方法(概率抽样或非概率抽样 ) • Parameter • 参数 – characteristic of the population – 总体的特征 • Statistic • 统计 – characteristic of a sample – 样本的特征 5

Random Sample 随机样本 • A random sample allows us to make estimates about the

Random Sample 随机样本 • A random sample allows us to make estimates about the larger population based on what we learn from the subset • 一个随机样本允许我们基于从该子集(样本)所了解的情 况,做出有关一个更大总体的估计 • Lottery, everyone has an equal chance • 博彩,每个人都有相同的机会 • Advantages: • 优点: – – – eliminates selection bias 消除选择偏差 able to generalize to the population 能够推断总体 cost-effective 节省成本 6

Types of Random Samples 随机样本的类型 • • • Simple random sample 简单随机样本 Random interval

Types of Random Samples 随机样本的类型 • • • Simple random sample 简单随机样本 Random interval sample 随机间隔样本 Stratified random sample 分层随机样本 Random cluster sample 随机整群样本 Multi-stage random sample 多等级随机样本 Combination random sample 合并随机样本 7

Simple Random Sample 简单随机样本 • Simplest • 最简单的一类 • Establish a sample size and

Simple Random Sample 简单随机样本 • Simplest • 最简单的一类 • Establish a sample size and proceed to randomly select units until we reach the sample size • 先确定样本大小,然后进行随机地抽取直到获得 预定数量的样本 • Uses a random number table to select units • 选择一个随机数量的表格来选取单位 8

Random Interval Sample 随机间隔样本 • Used when there is a sequential population that is

Random Interval Sample 随机间隔样本 • Used when there is a sequential population that is not already enumerated and would be difficult or time consuming to enumerate • 用于一个数列型的整体,这个整体还没有被清点 清楚,或者清点清楚过于费事且困难 • Uses a random number table to select intervals • 使用一个随机的数目表格来选取间隔 99

Stratified Random Sample 分层随机样本 • Use when specific groups must be included that might

Stratified Random Sample 分层随机样本 • Use when specific groups must be included that might otherwise be missed by using a simple random sample • 总体中有若干个特定的子类,样本必须把这些 子类都包含近来,但如果使用简单随机样本的 话可能会遗漏某些子类。这时要使用分层随机 样本。 – usually a small proportion of the population – 通常是总体的一小部分 10

Stratified Random Sample 分层随机样本 Total Population 总体 sub-population Subpopulation 子总体 simple random sample 简单随机样本

Stratified Random Sample 分层随机样本 Total Population 总体 sub-population Subpopulation 子总体 simple random sample 简单随机样本 sub-population 子总体 simple random sample 简单随机样本 11

Random Cluster Sample 随机整体样本 • Another form of random sampling • 另一种随机抽样 • Any

Random Cluster Sample 随机整体样本 • Another form of random sampling • 另一种随机抽样 • Any naturally occurring aggregate of the units that are to be sampled that are used when: • 任何的自然发生的单位的聚合,它们的样本化在下面情况下得到使用: – you do not have a complete list of everyone in the population of interest but have a list of the clusters in which they occur or – 你没有一个完整的名单,但是有一个名单,上面的参与者是连串 的 – you have a complete list of everyone, but they are so widely distributed that it would be too time consuming and expensive to send data collectors out to a simple random sample – 或者你有一个完整的名单,但是他们过于分散,因此给予收集者 一个简单的随机样本过于费事且昂贵 1212

Multi-stage Random Sample 随机多层样本 • Combines two or more forms of random sampling •

Multi-stage Random Sample 随机多层样本 • Combines two or more forms of random sampling • 结合2个或者多个种类的随机样本 • Most commonly, it begins with random cluster sampling and then applies sample random sampling or stratified random sampling • 最经常的情况是,从随机的连串样本开始,然后运用到简 单随机样本或者分层随机样本 1313

Combination Random Samples 合并随机样本 More than one random sampling technique is used 不只一种随机抽样技巧被使用 14

Combination Random Samples 合并随机样本 More than one random sampling technique is used 不只一种随机抽样技巧被使用 14

Drawback of Random Cluster and Multi-stage Random Sampling 随机连贯以及多层随机样本的缺陷 • May not yield an

Drawback of Random Cluster and Multi-stage Random Sampling 随机连贯以及多层随机样本的缺陷 • May not yield an accurate representation of the population • 可能无法精确地描述整体 1515

Summary of Random Sampling Process 随机抽样过程概述 Step Process 步骤 过程 1 Obtain a complete

Summary of Random Sampling Process 随机抽样过程概述 Step Process 步骤 过程 1 Obtain a complete listing of the entire population 取得总体的完整列表 2 Assign each case a number 对总体内的所有个体进行编号 3 Randomly select the sample using a random numbers table 使用随机数表随机地抽取样本 4 When no numbered listing exists or is not practical to create: 如果不存在一个经过编号的列表或在操作上很难形成这样一个 列表,则: • take a random start 随机开始 • select every nth case 每隔n个个体选取一个作为样本 16

Non-Random Samples 非随机样本 • Can be more focused • 更具有针对性 • Can make sure

Non-Random Samples 非随机样本 • Can be more focused • 更具有针对性 • Can make sure a small sample is representative • 能够保证一个小样本具有代表性 • Cannot make inferences to a larger population • 无法推断一个更大总体的情况 17

Types of Non-random Samples 非随机样本的种类 convenience 方便 whoever is easiest to contact or whatever

Types of Non-random Samples 非随机样本的种类 convenience 方便 whoever is easiest to contact or whatever is easiest to observe 能联系到的任何人,或能观察 到的任何事物 Snowball 滚雪球效应 ask people who else you should interview 询问人们你还能采访谁 purposeful (judgment) 目的明确 set criteria to achieve a specific mix of participants 确定标准,实现特定的参与者的混合 1818

Forms of Purposeful Samples 有意识的样本的种类 • • • Typical cases (median) 典型案例 (中间类型) Maximum

Forms of Purposeful Samples 有意识的样本的种类 • • • Typical cases (median) 典型案例 (中间类型) Maximum variation (heterogeneity) 最大变化(异质性) Quota 配额 Extreme case 极端例子 Confirming and disconfirming cases 确认的以及否认的案例 1919

Bias and Non-random Sampling 偏差和非随机抽样问题 • People selected in a biased way? • 选人的方法是否有偏差?

Bias and Non-random Sampling 偏差和非随机抽样问题 • People selected in a biased way? • 选人的方法是否有偏差? • Are they substantially different from the rest of the population? • 抽取的样本是否与总体的其它部分有重大的不同 ? • collect some data to show that the people selected are fairly similar to the larger population (e. g. demographics) • 收集一些数据来表明所选择的人与总体非常相似 (例如人口统计) 20

Combinations: Random and Non-Random 合并:随机样本和非随机样本 • Example: • 举例: – Non-randomly select two schools

Combinations: Random and Non-Random 合并:随机样本和非随机样本 • Example: • 举例: – Non-randomly select two schools from poorest communities and two from the wealthiest communities – 从最贫困的社区内选取 2所学校,并且从最富裕 的社区内选取 2所学校 – Select a random sample of students from these four schools – 从这 4所学校中随机选取学生样本 21

Possibility of Error 误差的概率 • Sample different from the population? • 样本与总体不同? • Statistics:

Possibility of Error 误差的概率 • Sample different from the population? • 样本与总体不同? • Statistics: data derived from random samples • 样本统计量:从随机样本得出的数据 22

How confident do you wish to be? 你希望要多大的可信度 – confidence level – 可信水平 •

How confident do you wish to be? 你希望要多大的可信度 – confidence level – 可信水平 • E. g. , 90% (90% certain your sample results are an estimate of the population as a whole) • 例如90%(能够 90%地确定你的样本统计量 是总体的估计值) – the higher confidence level, the larger sample needed – 可信水平越高,所需要的样本就越大 23

Confidence Standard 标准的可信水平 • Standard is 95% • 标准的可信水平是 95% – 19 of 20

Confidence Standard 标准的可信水平 • Standard is 95% • 标准的可信水平是 95% – 19 of 20 samples would have found similar results – 20个样本中有19个样本具有相似的样本统计量 – we are 95% certain that the population parameter is somewhere between the lower and upper confidence interval calculated from the sample – 我们可以 95%地确定样本统计量是总体的精确估计值 24

Confidence Interval 可信区间 • Sometimes called sampling error, margin of error, or precision •

Confidence Interval 可信区间 • Sometimes called sampling error, margin of error, or precision • 有时也被称为样本错误、错误范围或者精度 • Example: • 例如: – in polls 48% for, 52% against, with (+/- 3%) – 民意测验表明48%赞成,52%反对。(误差率正负 3%) – actually means 45% to 51% for and 49% to 55% against – 实际上45%-51%的人赞成,49 -55%的人反对 2525

Sample Size 样本容量 • By increasing sample size, you increase accuracy and decrease margin

Sample Size 样本容量 • By increasing sample size, you increase accuracy and decrease margin of error • 通过增大样本容量,你就提高了精确度,同时降低了边际误差 • The larger the margin of error, the less precise your results will be • 边际误差越大,样本统计量的精确度就越小 • The smaller the population, the smaller the needed sample size for a given confidence level and margin of error, but the larger the needed ratio of the sample size to the population size. • 总体越小,在给定可信区间和边际误差的前提下,需要的样本容量就 越小,但是样本与总体的比率就越大 • Aim for is a 95% confidence level and a margin of error of +/- 5% • 力求达到 95%的可信水平和 +/- 5%的边际误差 26

Sample Sizes for Large Populations 较大总体的样本容量 Precision 精确度 Confidence Level 可信区间 99% 95% 90%

Sample Sizes for Large Populations 较大总体的样本容量 Precision 精确度 Confidence Level 可信区间 99% 95% 90% 1% 16, 576 9, 604 6, 765 2% 4, 144 2, 401 1, 691 3% 1, 848 1, 067 752 5% 666 384 271 27

Summary of Sampling Size 样本容量的小结 • Accuracy and precision can be improved by increasing

Summary of Sampling Size 样本容量的小结 • Accuracy and precision can be improved by increasing the sample size • 精确性可以通过增加样本大小来提高 • The standard to aim for is a 95% confidence level and a margin of error of +/- 5% • 目标是达到 95%的可信程度,错误率是正负 5%之间 • The larger the margin of error, the less precise the results will be • 错误率越大,结果越不精确 • The smaller the population, the larger the needed ratio of the sample size to the population size • 整体总量越小,样本比率越大 2828

Where to Find a Sampling Statistician 如何找到样本统计师 • American Statistical Association (ASA) directory of

Where to Find a Sampling Statistician 如何找到样本统计师 • American Statistical Association (ASA) directory of statistical consultants • 美国统计协会(ASA)统计咨询师名录 – http: //www. amstat. org/consultantdirectory/index. cfm • Alliance of Statistics Consultants(统计咨询师联合) – http: //www. statisticstutors. com/#statistical-analysis • Hyper. Stat Online – http: //davidmlane. com/hyperstat/consultants. html 2929