PROCESSING OF DATA Preparing raw data Editing field

  • Slides: 27
Download presentation
PROCESSING OF DATA • Preparing raw data • Editing - field editing - office

PROCESSING OF DATA • Preparing raw data • Editing - field editing - office editing • Coding • Tabulation

TABULATION • Tabulation is a part of the technical process of statistical analysis that

TABULATION • Tabulation is a part of the technical process of statistical analysis that consists of counting the number of cases that fall into various categories. • Involves sorting, counting and summarizing of data.

FEW FACTORS TO BE CONSIDERED IN TABULATION • A clear , brief and self

FEW FACTORS TO BE CONSIDERED IN TABULATION • A clear , brief and self explanatory title is necessary for a table. • All the units used should be clearly indicated. • Stubs ( Row headings ) and captions ( column headings ) should be clearly mentioned. • The body of the table must show all the relevant information according to description. • As far as possible a brief form of the large figures should be used so that the reader may get a clear picture.

 • In order to specify certain aspects or to clear certain points or

• In order to specify certain aspects or to clear certain points or issues , an appropriate footnote may be used. • Head notes are also necessary below the main title for giving any additional necessary information. • Data should be arranged systematically ; that is chronologically; alphabetically and geographically ; this will enable the readers to grasp the various relevant aspects easily.

 • Items which need special emphasis must be specifically marked. • Adequate spacing

• Items which need special emphasis must be specifically marked. • Adequate spacing should be given in between the columns and rows.

Data Tabulation • Tabulation is a simple process of counting the number of observations

Data Tabulation • Tabulation is a simple process of counting the number of observations 9 cases) that are classified into certain categories. • One way tabulation is the categorization of single variables existing in the study. • Cross tabulation is simultaneously treating two or more variables in the study.

Advantage of one way tabulation • 1. Indications of missing data: They can be

Advantage of one way tabulation • 1. Indications of missing data: They can be used to determine degree of non response to individual questions. • Can be used to locate simple blunders in data entry. -if a specific range of codes has been established for a given response to a question, say 1 through 5, a one way tabulation can illustrate if an inaccurate code was entered, say a 7 or 8.

 • Determining valid percentages : used to profile sample respondents, establish characteristics that

• Determining valid percentages : used to profile sample respondents, establish characteristics that distinguish between groups (i. e. , heavy users versus light users), and establish the percentage of respondents who respond differently to different situations (e. g. , the percentage of people who purchase fast food from drive –thru windows, and those who use dine-in facilities). • Summary statistics:

SORTING AND COUNTING OF DATA INCOME (RS) TALLY MARKS FREQUENC Y 1000 1111 12

SORTING AND COUNTING OF DATA INCOME (RS) TALLY MARKS FREQUENC Y 1000 1111 12 1500 1111 11 5

KINDS OF TABULATION • Simple or one way tabulation -open question with only one

KINDS OF TABULATION • Simple or one way tabulation -open question with only one response -multiple response to open and multiple choice questions • Cross or two way tabulation • High order tabulation

SIMPLE OR ONE WAY TABULATION • Single variable is counted • Separate table for

SIMPLE OR ONE WAY TABULATION • Single variable is counted • Separate table for each variables may be prepared • Dichotomous scale or multiple choice questions which allow only one answer • Also called as Univariate tabulation • Answers may be put in the form of percentages

1. Open question with only one response Number of children Number of families In

1. Open question with only one response Number of children Number of families In percent 0 10 5 1 30 15 2 70 35 3 60 30 4 34 7 More than 4 16 8 100

Multiple response to open and multiple choice questions • • More than one answer

Multiple response to open and multiple choice questions • • More than one answer The responses need not be total to 100% There may be duplication of answers Researcher will have to determine the number of additional or unduplicated responses

Advantages • Used to locate blunders, outlines • To determine the empirical distribution of

Advantages • Used to locate blunders, outlines • To determine the empirical distribution of the variables • Introduces a sense of accuracy • The distribution of frequencies will reveal the characteristics • Mean, median, mode, graphs are made easier

CROSS TABULATION • Includes two or more variables which are treated simultaneously • Multiresponse

CROSS TABULATION • Includes two or more variables which are treated simultaneously • Multiresponse questions are properly treated • Studies the relationship among and between variables • Variables are divided into groups and subgroups (dependent and independent) • Multivariate tabulation

Percentage distribution of customers who faced problems in the bank Alternatives Percent Never faced

Percentage distribution of customers who faced problems in the bank Alternatives Percent Never faced a problem while dealing with the bank staff 72 Faced problems atleast once 28 100 total number of customers 1500

An example could be a cross-tabulation between Brand Preference for brands of tea and

An example could be a cross-tabulation between Brand Preference for brands of tea and Region to which Respondent belongs. Assuming we have the data on these two variables from a study, the cross tabulation may look like this – BRAND Regionwise Buyers (No. ) North South East West Total Brooke. Bond 25 20 20 15 80 Lipton Tata Total 10 15 15 50 50 20 15 5 10 50 30 70 50 50 200

Calculating Percentages in a Cross Tabulation • There are two or three different ways

Calculating Percentages in a Cross Tabulation • There are two or three different ways percentages can be calculated. For example, in the above example, we can compute percentages row-wise, column-wise or on the total sample of 200. • The general rule for percentage calculation is to calculate it across the dependent variable. In the above example, “Brand” is the dependent variable, and “Region” is the independent variable. • percentages must be calculated across Brand categories – that is, column-wise. • “Out of 50 respondents from the Northern Region, 50% buy Brooke Bond, 20% buy Lipton, and 30% buy Tata Tea.

percentages can be displayed in a table form separately, or in brackets along with

percentages can be displayed in a table form separately, or in brackets along with number of respondents. BRAND Regionwise Buyers-Numbers and Percentage North south East West Total Brooke. Bond 25(50%) 20(40%) 15(30%) 80(40%) Lipton 10(20%) 15(30%) 20(40%) 5(10%) 50(25%) Tata 15(30%) 10(20%) 30(60%) 70(35%) Total 50(100%) 200(200%)

Selecting the factors for crosstabulation For example: a company wants to know whethere exists

Selecting the factors for crosstabulation For example: a company wants to know whethere exists some key factors related to performance of its employees • Experience • Gender • Scores in aptitude • Education Based on request or intuition

PROOF OF RELATIONSHIP • • Initiating third variable Then by fourth variable May reveal

PROOF OF RELATIONSHIP • • Initiating third variable Then by fourth variable May reveal particular needs of population Correlation between two or more variables can be known

Lack of Causal Inference in Cross Tabulations • Even if the cross-tabulation shows a

Lack of Causal Inference in Cross Tabulations • Even if the cross-tabulation shows a significant association between the two variables, it does not necessarily mean that one of them (the independent) causes the other (the dependent). • Causality or direct effect is more of an assumption made by the researcher based on his expectation or experience. • The mere existence of a statistically significant association does not necessarily imply a cause-andeffect relationship between the (presumed) independent and the (presumed) dependent variable.

HIGHER ORDER TABULATION • Incorporates several kinds of information whether related or unrelated •

HIGHER ORDER TABULATION • Incorporates several kinds of information whether related or unrelated • Composite tabulation • Eg. Demographic characteristics may reveal several characteristics year-wise - Population, density of population, education, occupation, religion and so on • Provide unlimited information • Doesnot serve a specific purpose, used for all purposes

MANUAL TABULATION • Only for small survey using small number of questionnaires • When

MANUAL TABULATION • Only for small survey using small number of questionnaires • When the researcher does not intend to use complex statistical techniques for analysis

MECHANICAL TABULATION • Involves machine • Transforms all the entries in the questionnaire in

MECHANICAL TABULATION • Involves machine • Transforms all the entries in the questionnaire in numerical terms or codes, records all the entries on the punch card by punching holes, sorts out the cards (by sorting machines) and tabulates the data by use of a machine. • Verified by reading its punched holes • Verifier is used for verifying data • Then sorted and tabulated by sorting machines

 • Advantages - speed up the work -facilitates crossclassifications and tabulations -can be

• Advantages - speed up the work -facilitates crossclassifications and tabulations -can be used for periodic studies -can avoid errors in sorting and counting -scope for adequate analysis • Disadvantages -very high cost -may cause inconvenience due to despatching of raw data

COMPUTER TABULATION • Uses computer • Electronically record, store and perform any calculations •

COMPUTER TABULATION • Uses computer • Electronically record, store and perform any calculations • The data can be stored for future references • Can prepare multiple copies