Chapter 1 Data Collection Where Why and How

  • Slides: 35
Download presentation
Chapter 1 Data Collection -- Where, Why and How BUS 304 – Data Collection

Chapter 1 Data Collection -- Where, Why and How BUS 304 – Data Collection

Content 1. Concept of Statistics, tools for data collection 1. Populations and Samples, Four

Content 1. Concept of Statistics, tools for data collection 1. Populations and Samples, Four Sampling Techniques 1. Data Types and Measurement Levels BUS 304 – Data Collection

A decision process Chapter 1 Chapter 2 and 3 Data Collection Data Presentation and

A decision process Chapter 1 Chapter 2 and 3 Data Collection Data Presentation and Characterization • risks of using cell phone while driving • public reaction, • police department ability to enforce Chapter 4 and after + Decision Models Making inference Making Decision • list of risks • inference consequences • tables • evaluate probabilities • graphs • evaluate alternatives • data measurements • evaluate costs • etc… BUS 304 – Data Collection

Basic Concepts v Business Statistics § A collection of tools and techniques that are

Basic Concepts v Business Statistics § A collection of tools and techniques that are used to convert data into meaningful information in a business environment Descriptive statistics: § tools that collect, present, and describe data Inferential Statistics § tools that draw conclusions and/or make decisions concerning a population based only on sample data. § Estimation: how many people will buy (population) based on the sample response of the product? § Hypothesis Testing: should I change mkting strategy? BUS 304 – Data Collection

Data Collection Methods v Experiments § evaluating the reliability and gas consumption of the

Data Collection Methods v Experiments § evaluating the reliability and gas consumption of the hybrid car. v Telephone Surveys § Marketing researches for existing customers, after sales queries, etc. v Written questionnaires and Surveys § Mailed survey to existing customers § Written survey distributed on the street v Direct observation and Personal Interviews § Largely used in consulting and IT development BUS 304 – Data Collection

Survey: Have you ever been v asked to fill out a written survey? v

Survey: Have you ever been v asked to fill out a written survey? v called to answer a phone survey? What questions did they ask? Why did they ask those questions? What will they do with those questions? Do you always answer all the questions? Do you always provide the “truth”? BUS 304 – Data Collection

Survey Design Steps v Define the issue § What are the purpose and objectives

Survey Design Steps v Define the issue § What are the purpose and objectives of the survey? v Define the population of interest § Who you want to ask questions for? v Formulate survey questions § Make questions clear and unambiguous § Use universally-accepted definitions § Limit the number of questions v Pretest the survey § Pilot test with a small group of participants § Assess clarity and length v Determine the sample size and sampling method v Select sample and administer the survey BUS 304 – Data Collection

Exercise v What could be some potential problems with the following survey questions? §

Exercise v What could be some potential problems with the following survey questions? § Do you agree with most other reasonably minded people that the city should spend more money on neighborhood parks? § To what extent would you support paying a small increase in your property taxes if it would allow poor and disadvantaged children to have food and shelter? § How much money do you make at your current job? § After trying the new product, please provide a rating from 1 to 10 to indicate how you like its taste and freshness? § Do you agree that the ambiance was divine? § Do you agree that the service was impeccable? Some of those problems are obvious, but not always that obvious when you make them -- pretest is very important! BUS 304 – Data Collection

Data Collection Bias are generally inevitable. We try to reduce the bias, but sometimes

Data Collection Bias are generally inevitable. We try to reduce the bias, but sometimes there are still some. § Interview Bias § Nonresponse Bias § Selection Bias § Observer Bias § Measurement Error BUS 304 – Data Collection

Population and Samples v A statistic research always starts with a question: § What

Population and Samples v A statistic research always starts with a question: § What is the average starting salary for a business major nation-wide? for CSUSM? § Are the house prices in San Diego unaffordable? § Are the college textbooks too expensive? § Is Dr. Fang a nice person? § Will Obama win the presidential election? § Lots of others… v Population: -- All the items that are of interest -- Exercise: Determine the population for each question on the left Sample: -- A subset of the population Population a b Sample cd b ef gh i jk l m n o p q rs t u v w x y z c gi o n r u y How to determine? -- Check whether it covers all the items of interest 3/11/2021 BUS 304 – Data Collection 10

Sampling: v Techniques to select only part of the population to conduct the study

Sampling: v Techniques to select only part of the population to conduct the study v You definitely loss certain accuracy in the answer v But sometimes it is more reasonable to use sample than use population § Less time consuming § Less costs § Sometimes, study is destructive. e. g. car durability test, matches BUS 304 – Data Collection

Sampling Techniques v Non-Statistical Sampling v Statistical Sampling § Samples are selected at §

Sampling Techniques v Non-Statistical Sampling v Statistical Sampling § Samples are selected at § Use probability theory to convenience guide the selection § Results will be subject to § Ensure that the sample is bias very likely (or at least with a § Examples: measurable odd) to represent • Ask a friend, a neighbor, etc. the population • Judges. § Sampling bias can be estimated (as we will learn later the semester) BUS 304 – Data Collection

Four Statistical Sampling Techniques (1) Simple Random Systematic v Simple random Sampling § The

Four Statistical Sampling Techniques (1) Simple Random Systematic v Simple random Sampling § The most basic statistical sampling method. § Select at random § Dice, Card, Random number generator (calculator, Excel) Exercise: Stratified Cluster § Use random number generator in Excel to select a sample of ten NBA players and find out the average weight. • “NBA Roster” File • Tutorial – RNG PPT BUS 304 – Data Collection

Four Statistical Sampling Techniques (2) v Systematic Sampling Simple Random Systematic § A simplified

Four Statistical Sampling Techniques (2) v Systematic Sampling Simple Random Systematic § A simplified version of simple random sampling § Select a random start, and then go by equal space (interval) Stratified Cluster § Question: how to determine the interval so that everyone has a chance to be selected? Formula: Interval = Population size / sample size BUS 304 – Data Collection

Systematic sampling exercise § Use systematic sampling technique to select 10 NBA players and

Systematic sampling exercise § Use systematic sampling technique to select 10 NBA players and find out the average weight. § Think? How many random numbers you need to generate? BUS 304 – Data Collection

Four Statistical Sampling Techniques (3) Simple Random Systematic v Stratified Sampling § Divide the

Four Statistical Sampling Techniques (3) Simple Random Systematic v Stratified Sampling § Divide the population into subgroups § Use simple random sampling method (or systematic sampling) to select from each group § Combine to form one big sample Stratified Cluster § Think: what is the benefit of using stratified sampling? • More representative BUS 304 – Data Collection

Stratified sampling exercise § Use stratified sampling technique to select a sample of 10

Stratified sampling exercise § Use stratified sampling technique to select a sample of 10 NBA players, including 2 Power Forwards, 2 Shooting Guards, 2 Point Guards, and 2 Centers. § Find out the average weight. § Why we want to control the proportion for each position? BUS 304 – Data Collection

Four Statistical Sampling Techniques (4) Simple Random v Cluster Sampling § Divide the population

Four Statistical Sampling Techniques (4) Simple Random v Cluster Sampling § Divide the population into subgroups -- called “clusters”. Systematic § Randomly select some subgroups (not all!) Stratified § In each selected subgroup, use random sampling technique to select subsamples § Combine the sub-samples to form one aggregate sample Cluster § Think: when we use cluster sampling? (e. g. market research, select towns first) BUS 304 – Data Collection

Clustered Sampling Exercise § Use each NBA team as a cluster § Randomly select

Clustered Sampling Exercise § Use each NBA team as a cluster § Randomly select 5 teams to conduct the study § In each of the selected teams, select 2 players § Combine them into an aggregate sample of five. § Think, how many times do you need to use the Random Number Generator? § Discuss the difference between cluster sampling technique and stratified sampling technique. BUS 304 – Data Collection

Compare different techniques v Simple random sampling and systematic sampling: § Need to know

Compare different techniques v Simple random sampling and systematic sampling: § Need to know the population size § Doesn’t care about the composition of the population v Stratified sampling: § Use the information about the population composition to control sample § The sample can be more representative to the population v Cluster sampling: § Generally used when you have a geographically distributed population § Divide the population into several geographical areas § Randomly select some areas (not all) to study – cost saving. v Sometimes, a combination of techniques can be used. BUS 304 – Data Collection

Discussion v Which sampling techniques should be used for (or are used in) the

Discussion v Which sampling techniques should be used for (or are used in) the following studies? – discuss the potential bias of the techniques. 1. NBC wants to conduct an opinion poll to understand people’s opinion on Hillary Clinton’s chance of being selected as president in 2008. 2. CSUSM wants to collect opinions about how the junior faculty members teach their classes 3. Policemen want to detect drunk drivers to prevent potential accidents. 4. Oscar judges determine the best pictures of the year. 5. Fans vote for the NBA all-star team. 6. American Citizens vote for president. BUS 304 – Data Collection

Data Source v Primary Source v Secondary Source § Observations § Books & CDs

Data Source v Primary Source v Secondary Source § Observations § Books & CDs § Surveys § Newspaper, magazine § Experiments § Internet Difference: Whether you collect the data or not. 3/11/2021 BUS 304 – Data Collection 22

Think v For the NBA players’ weight experiment, is the data source primary or

Think v For the NBA players’ weight experiment, is the data source primary or secondary? Why? v Is the data collected from students’ evaluation primary or secondary? Why? BUS 304 – Data Collection

Discussion v What are the benefits of using primary data? v What are the

Discussion v What are the benefits of using primary data? v What are the benefits of using secondary data? BUS 304 – Data Collection

Data Types v Quantitative Data § Numerical data (all numbers) § E. g. number

Data Types v Quantitative Data § Numerical data (all numbers) § E. g. number of hours that students work at a paying job v Qualitative Data § Non-numerical (e. g. with non-numerical characters) § E. g. students judge the quality of education: very poor, fair, good, or very good. • Note: mostly recorded as 1, 2, 3, 4, 5, but it indicates quality level, which should be translate to the meaning and considered as qualitative data • “ 2” (poor) + “ 3” (fair) “ 5” (very good) BUS 304 – Data Collection

Data Measurement Levels Measurements Numerical Value Rankings Ordered Categories Categorical Codes ID Numbers Category

Data Measurement Levels Measurements Numerical Value Rankings Ordered Categories Categorical Codes ID Numbers Category Names Ratio/Interval Data Highest Level Complete Analysis Ordinal Data Higher Level Mid-Level Analysis Nominal Data Lowest Level Basic Analysis BUS 304 – Data Collection

Nominal Data v The lowest form of data, Yet you always encounter such data

Nominal Data v The lowest form of data, Yet you always encounter such data v Mostly “qualitative”, non-numerical § Students Names, Addresses, Majors, Customer Preferences, Marriage Status, Payment Methods, etc……. v Sometimes can be numerical, normally used as to identify the individual, cannot be grouped and aggregated to provide more information. § Student ID number § Bank account § Social security number, etc. v Nominal data are the most basic type of data measurement level. We generally cannot do much analysis about it § In designing the survey, you should try to avoid all “nominal data. ” BUS 304 – Data Collection

Ordinal Data v Many students confuse the name of “ordinal” with “interval” v Also

Ordinal Data v Many students confuse the name of “ordinal” with “interval” v Also called “rank data”. – can rank orders on the basis of some relationship among them. v Outstanding example: § income intervals: “under $20, 000”, “$20, 000 to $40, 000”, “over $40, 000” § GPA intervals: “<2. 0”, “ 2. 0 to 3. 0”, “>3. 0” § Professor Ranking: “adjunct professor”, “assistant professor”, “associate professor”, “professor” v Note, sometimes ranking are within certain content, extending the ranking may cause controversial issues (e. g. social ranking in some country) v Such data Allows decision maker to equate two or more observations or to rank-order the observations. BUS 304 – Data Collection

Ration/Interval Data v Numerical Data Values: temperature, grade, income, age, etc. v If “

Ration/Interval Data v Numerical Data Values: temperature, grade, income, age, etc. v If “ 0” nothing Interval Data (Temperature) § Can find interval between two values: Today is 2 F higher than yesterday § Cannot find ratio: 80 F is not twice as warm as 40 F. v If “ 0” = nothing Ratio Data (income, age, grade, etc. ) § Can find both interval and ratio: I earn $15, 000 more than your annual income, and I earn twice as much as his salary. v Both can be used to conduct certain mathematically and statistical analysis, e. g. averaging, etc. BUS 304 – Data Collection

A matrix used to determine data measurement level Compare Difference Find Interval Put in

A matrix used to determine data measurement level Compare Difference Find Interval Put in Order Average and (subtraction) more Nominal Ordinal Interval Ratio BUS 304 – Data Collection

Exercise v Determine the data measurement levels Nominal Ratio Ordinal Airline Gender First Class?

Exercise v Determine the data measurement levels Nominal Ratio Ordinal Airline Gender First Class? Luggage Ticket Price America West Male Yes 3 <$100 Other Female Yes 3 $100 -$150 America West Male No 3 $150 -$200 United Male No 1 $200 -$250 America West Female No 2 $150 -$200 United Male Yes 3 $150 -$200 America West Female No 2 $200 -$250 Other Female No 1 $150 -$200 America West Female No 2 <$100 BUS 304 – Data Collection

More exercise v For each of the following variables, indicate the level of data

More exercise v For each of the following variables, indicate the level of data measurements: § marital status {single, married, divorced, other} § home ownership {own, rent, other} § product rating {1=excellent, 2=good, 3=fair, 4=poor, 5=very poor} § unemployment rates of CA § monthly sales § student gender BUS 304 – Data Collection

Data Measurement Level v Indentifying Data Measurement Levels are the starting point of data

Data Measurement Level v Indentifying Data Measurement Levels are the starting point of data analysis, presentation and characterization. It tells you what you can do about the data that collected! BUS 304 – Data Collection

Summary § Basic Concept of Statistics § Data Collection Methods § Sampling Techniques: •

Summary § Basic Concept of Statistics § Data Collection Methods § Sampling Techniques: • Concepts: Population vs. Sample • Four sampling techniques: processes, pros and cons § Data sources: primary or secondary • Whether you collect the data or use some one else’s § Data types: quantitative or qualitative • Whether the data were purely numerical § Data Measurement Levels Nominal / Ordinal / Interval / Ratio BUS 304 – Data Collection

Concepts checklist Cluster Primary Data Source Cluster Sampling Qualitative Data Measurement Levels Quantitative Data

Concepts checklist Cluster Primary Data Source Cluster Sampling Qualitative Data Measurement Levels Quantitative Data Descriptive Statistics Ratio Data Experiment Sample Inferential Statistics Sampling Techniques Interval Data Secondary Data Source Nominal Data Statistics Ordinal Data Stratified Random Sampling Phone Survey Stratum Population Systematic Random Sampling BUS 304 – Data Collection