MTH 161 Introduction To Statistics Lecture 02 Dr

  • Slides: 47
Download presentation
MTH 161: Introduction To Statistics Lecture 02 Dr. MUMTAZ AHMED

MTH 161: Introduction To Statistics Lecture 02 Dr. MUMTAZ AHMED

Objectives Methods of Data Presentations � Classification of Data �Bases of Classification �Types of

Objectives Methods of Data Presentations � Classification of Data �Bases of Classification �Types of Classifications � Tabulation of Data �Types of Tabulations �Constructing a Statistical Table �General Rules of Tabulation � Table of frequency distributions �Frequency Distribution �Relative frequency distribution �Cumulative frequency distribution

Organizing Data After collecting data, the first task for a researcher is to organize

Organizing Data After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get a general overview of the results. Raw Data: Data which is not organized is called raw data. Un-Grouped Data: Data in its original form is called Un-Grouped Data. Note: Raw data is also called ungrouped data.

Different Ways of Organizing Data To get an understanding of the data, it is

Different Ways of Organizing Data To get an understanding of the data, it is organized and arranged into a meaningful form. This is done by the following methods: � Classification � Tabulation (e. g. simple tables, frequency tables, stem and leaf plots etc. ) � Graphs (Bar Graph, Pie chart, Histogram, Frequency Ogive etc. )

Classification of Data The process of arranging data into homogenous group or classes according

Classification of Data The process of arranging data into homogenous group or classes according to some common characteristics present in the data is called classification. Example: The process of sorting letters in a post office, the letters are classified according to the cities and further arranged according to streets.

Bases of Classification There are four important bases of classification: �Qualitative Base �Quantitative Base

Bases of Classification There are four important bases of classification: �Qualitative Base �Quantitative Base �Geographical Base �Chronological or Temporal Base

Bases of Classification �Qualitative Base: When the data are classified according to some quality

Bases of Classification �Qualitative Base: When the data are classified according to some quality or attributes such as sex, religion, etc. �Quantitative Base: When the data are classified by quantitative characteristics like heights, weights, ages, income etc.

Bases of Classification �Geographical Base: When the data are classified by geographical regions or

Bases of Classification �Geographical Base: When the data are classified by geographical regions or location, like states, provinces, cities, countries etc. �Chronological or Temporal Base: When the data are classified or arranged by their time of occurrence, such as years, months, weeks, days etc. (e. g. Time series data).

Types of Classification There are Three main types of classifications: �One -way Classification �Two-way

Types of Classification There are Three main types of classifications: �One -way Classification �Two-way Classification �Multi-way Classification

One -way Classification If we classify observed data keeping in view single characteristic, this

One -way Classification If we classify observed data keeping in view single characteristic, this type of classification is known as one-way classification. Example: The population of world may be classified by religion as Muslim, Christian etc.

Two-way Classification If we consider two characteristics at a time in order to classify

Two-way Classification If we consider two characteristics at a time in order to classify the observed data then we are doing two way classifications. Example: The population of world may be classified by Religion and Sex.

Multi-way Classification If we consider more than two characteristics at a time in order

Multi-way Classification If we consider more than two characteristics at a time in order to classify the observed data then we are doing multi-way classification. Example: The population of world may be classified by Religion, Sex and Literacy.

Tabulation of Data �The process of placing classified into tabular form is known as

Tabulation of Data �The process of placing classified into tabular form is known as tabulation. data �A table is a symmetric arrangement of statistical data in rows and columns. �Rows are horizontal arrangements whereas columns are vertical arrangements.

Types of Tabulation There are Three types of tabulation: �Simple or One-way Table �Double

Types of Tabulation There are Three types of tabulation: �Simple or One-way Table �Double or Two-way Table �Complex or Multi-way Table

Simple or One-way Table When the data are tabulated to one characteristic, it is

Simple or One-way Table When the data are tabulated to one characteristic, it is said to be simple tabulation or one-way tabulation. Example: Tabulation of data on population of world classified by one characteristic like Religion, is an example of simple tabulation.

Double or Two-way Table When the data are tabulated according to two characteristics at

Double or Two-way Table When the data are tabulated according to two characteristics at a time. It is said to be double tabulation or two-way tabulation. Example: Tabulation of data on population of world classified by two characteristics like Religion and Sex, is an example of double tabulation.

Complex or Multi-way Table When the data are tabulated according to many characteristics (generally

Complex or Multi-way Table When the data are tabulated according to many characteristics (generally more than two), it is said to be complex tabulation. Example: Tabulation of data on population of world classified by three characteristics like Religion, Sex and Literacy etc.

Construction of Statistical Table A statistical table has at least four major parts and

Construction of Statistical Table A statistical table has at least four major parts and some other minor parts. �The Title �The Box Head (column captions) �The Stub (row captions) �The Body �Prefatory Notes �Foot Notes �Source Notes

General Sketch of Table THE TITLE (Prefatory Notes) Row Caption Stub Entries Foot Notes…

General Sketch of Table THE TITLE (Prefatory Notes) Row Caption Stub Entries Foot Notes… Source Notes… Box Head Column Caption The Body

General Sketch of Table THE TITLE (Prefatory Notes) � A title is the main

General Sketch of Table THE TITLE (Prefatory Notes) � A title is the main heading written in Box Head capital shown at the top Row Caption Column Caption of the table. � It must explain the contents of the table and throw light on the table The Body as whole. Stub Entries � Different parts of the heading can be separated by commas and no full stop should be used in Foot Notes… Source Notes… the little.

General Sketch of Table THE Box Head (Column Captions) THE TITLE (Prefatory Notes) �

General Sketch of Table THE Box Head (Column Captions) THE TITLE (Prefatory Notes) � The vertical heading and subheading of the column Row Caption are called columns captions. � The spaces where these column headings are written is called box head. Stub Entries � Only the first letter of the box head is in capital letters and the Foot Notes… remaining words must be Source Notes… written in small letters. Box Head Column Caption The Body

General Sketch of Table THE TITLE THE Stub (Prefatory Notes) (Row Captions) Box Head

General Sketch of Table THE TITLE THE Stub (Prefatory Notes) (Row Captions) Box Head � The horizontal headings and sub-heading of the Row Caption Column Caption row are called row captions. � The space where these Stub Entries row headings are written is called stub. Foot Notes… Source Notes… The Body

General Sketch of Table THE TITLE (Prefatory Notes) THE Body � It is the

General Sketch of Table THE TITLE (Prefatory Notes) THE Body � It is the main part of the Row Caption table which contains the numerical information classified with respect to row and column captions. Stub Entries Foot Notes… Source Notes… Box Head Column Caption The Body

General Sketch of Table THE TITLE (Prefatory Notes) Prefatory Notes � A statement given

General Sketch of Table THE TITLE (Prefatory Notes) Prefatory Notes � A statement given below Row Caption the title and enclosed in brackets usually describe the units of measurement. Stub Entries Foot Notes… Source Notes… Box Head Column Caption The Body

General Sketch of Table THE TITLE (Prefatory Notes) Foot Notes � It appears immediately

General Sketch of Table THE TITLE (Prefatory Notes) Foot Notes � It appears immediately Row Caption below the body of the table providing the further additional explanation. Stub Entries Foot Notes… Source Notes… Box Head Column Caption The Body

General Sketch of Table THE TITLE Source Notes (Prefatory Notes) � The source notes

General Sketch of Table THE TITLE Source Notes (Prefatory Notes) � The source notes is Box Head given at the end of the table indicating the Row Caption Column Caption source from where the information has been taken. � It includes the The Body Stub Entries information about compiling agency, publication etc. Foot Notes… Source Notes…

General Rules of Tabulation �A table should be simple and attractive. A complex table

General Rules of Tabulation �A table should be simple and attractive. A complex table may be broken into relatively simple tables. �Headings for columns and rows should be proper and clear. �Suitable approximation may be adopted and figures may be rounded off. But this should be mentioned in the prefatory note or in the foot note. �The unit of measurement and nature of data should be well defined.

Organizing Data via Frequency Tables One method for simplifying and organizing data is to

Organizing Data via Frequency Tables One method for simplifying and organizing data is to construct a frequency distribution. Frequency Distribution: The organization of a set of data in a table showing the distribution of the data into classes or groups together with the number of observations in each class or group is called a Frequency Distribution. Class Frequency: The number of observations falling in a particular class is called class frequency or simply frequency, denoted by ‘f’. Grouped Data: Data presented in the form of a frequency distribution is called grouped data.

Why Use Frequency Distributions? � A frequency distribution is a way to summarize data.

Why Use Frequency Distributions? � A frequency distribution is a way to summarize data. � A frequency distribution condenses the raw data into a more meaningful form. � A frequency distribution allows for a quick visual interpretation of the data. Frequency Distributions can be drawn for qualitative data as well as quantitative data.

Frequency Distribution of Discrete Data Example: Number of children in 20 families. 2 3

Frequency Distribution of Discrete Data Example: Number of children in 20 families. 2 3 1 3 2 5 4 1 4 2 3 5 2 1 3 1 2 0 Construct un-grouped or discrete frequency distribution. Interpretation: There is 1 family with no children. 4 families with 1 children 6 families with 2 children 4 families with 3 children 2 families with 4 children and 3 families with 1 children. No of Children Tally No of Families (frequency) f 0 | 1 1 |||| 4 2 ||||| 6 3 |||| 4 4 || 2 5 ||| 3 Total 20

Grouped Frequency Distribution � Sometimes, when the data is continuous or covers a wide

Grouped Frequency Distribution � Sometimes, when the data is continuous or covers a wide range of values, it becomes very burdensome to make a list of all values as in that case the list will be too long. � To remedy this situation, a grouped frequency distribution table is used.

Grouped Frequency Distribution for Continuous Data Example (Temperature Data): Temperature of 20 winter days

Grouped Frequency Distribution for Continuous Data Example (Temperature Data): Temperature of 20 winter days in Pakistan is recorded below: 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27 Construct frequency distribution. Note: Temperature is a continuous variable because it could be measured to any degree of precision desired.

Steps in Constructing Grouped Frequency Distribution Sort raw data from low to high: 12,

Steps in Constructing Grouped Frequency Distribution Sort raw data from low to high: 12, 13, 17, 21, 24, 26, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 � Find range: Range=maximum value – minimum value=58 - 12 = 46 � Select number of classes: 5 (usually between 5 and 20) � Compute class width: Class width=Range/no of class=46/5=9. 2 ~ 10 � Determine class limits: 11 -20, 21 -30, 31 -40, 41 -50, 51 -60 (Note: the above classes should cover the full data) � Count the number of values in each class

Frequency Distribution of Grouped Data Sorted Data: 12, 13, 17, 21, 24, 26, 27,

Frequency Distribution of Grouped Data Sorted Data: 12, 13, 17, 21, 24, 26, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Frequency Distribution (Temp Data) Classes Tally Frequency (f) 11 -20 ||| 3 21 -30 |||||| 7 31 -40 |||| 4 41 -50 |||| 4 51 -60 || 2 Total 20

Frequency Distribution of Qualitative Data Political Party Affiliations: Professor X asked his introductory statistics

Frequency Distribution of Qualitative Data Political Party Affiliations: Professor X asked his introductory statistics students to state their political party affiliations as PML-N(N), PPP(P), PTI and PML-Q(Q). The responses of the 30 students in a class are: PPP N Q PTI N Q N PPP PTI N PTI PPP N Q N PTI Q PTI PPP PTI N PTI Q PPP Construct a frequency distribution. Party Tally Freq (f) PTI |||| 10 Interpretation: Out of 30 students in the class, 10 are in favor of PTI 9 are in favor of PML-N 6 are in favor of PML-Q and 5 are in favor of PPP. N |||| 9 Q ||||| 6 P |||| 5 Total 30

Relative Frequency Distribution � Relative Frequency is the ratio of the frequency to the

Relative Frequency Distribution � Relative Frequency is the ratio of the frequency to the total number of observations. Relative frequency = Frequency/Number of observations Example: Relative frequency of students who favored PTI=10/30=0. 333=33. 33% Relative frequency of students who favored PML-N=9/30=0. 3=30% Relative frequency of students who favored PML-Q=6/30=0. 2=20% Relative frequency of students who favored PPP=5/30=0. 167=16. 67%

Frequency Distribution of Qualitative Data Party Affiliation Example: Interpretation: Out of 30 students in

Frequency Distribution of Qualitative Data Party Affiliation Example: Interpretation: Out of 30 students in the class, 33. 3% are in favor of PTI Party Freq (f) Relative Freq 30% are in favor of PML-N PTI 10 10/30=0. 3333 20% are in favor of PML-Q N 9 9/30=0. 30 Q 6 6/30=0. 20 and P 5 5/30=0. 1667 16. 7% are in favor of PPP. Total 30 1

Cumulative Frequency Distribution Cumulative Frequency: The total frequency of a variable from its one

Cumulative Frequency Distribution Cumulative Frequency: The total frequency of a variable from its one end to a certain values (usually upper class boundary in grouped data), called the base, is known as cumulative frequency less than or more than the base of the variable. Cumulative Frequency Distribution: The table showing cumulative frequencies is called cumulative frequency distribution.

Cumulative Frequency Distribution Constructing Class Boundaries: Take difference of lower limit of second class

Cumulative Frequency Distribution Constructing Class Boundaries: Take difference of lower limit of second class and upper limit of first class. (e. g. 21 -20=1), Then divide this difference by 2. (i. e. ½=0. 5). Subtract the resulting number (i. e. 0. 5) from lower class limit of each class and add the resulting number (i. e. 0. 5) to the upper class limit of each class. The newly obtained classes are called Class Boundaries (C. B). Classes Class Boundaries Frequency (f) 11 -20 10. 5 -20. 5 3 21 -30 20. 5 -30. 5 6 31 -40 30. 5 -40. 5 5 41 -50 40. 5 -50. 5 4 51 -60 50. 5 -60. 5 2 Total 20

Less than Cumulative Frequency Distribution of temperature data Classes Class Frequency Boundaries (f) Less

Less than Cumulative Frequency Distribution of temperature data Classes Class Frequency Boundaries (f) Less than Cumulative frequency distribution of Temp data Class Boundaries Cumulative Frequency 11 -20 10. 5 -20. 5 3 Less than 10. 5 0 21 -30 20. 5 -30. 5 6 Less than 20. 5 3 31 -40 30. 5 -40. 5 5 Less than 30. 5 3+6=9 41 -50 40. 5 -50. 5 4 Less than 40. 5 9+5=14 51 -60 50. 5 -60. 5 2 Less than 50. 5 14+4=18 20 Less than 60. 5 18+2=20 Total

More than Cumulative Frequency Distribution of temperature data Classes Class Frequency Boundaries (f) More

More than Cumulative Frequency Distribution of temperature data Classes Class Frequency Boundaries (f) More than Cumulative frequency distribution of Temp data Class Boundaries Cumulative Frequency 11 -20 10. 5 -20. 5 3 More than 10. 5 20 21 -30 20. 5 -30. 5 6 More than 20. 5 20 -3=17 31 -40 30. 5 -40. 5 5 More than 30. 5 17 -6=11 41 -50 40. 5 -50. 5 4 More than 40. 5 11 -5=6 51 -60 50. 5 -60. 5 2 More than 50. 5 6 -4=2 20 More than 60. 5 2 -2=0 Total

Stem and Leaf Plot Disadvantage of Frequency Table: An obvious disadvantage of using frequency

Stem and Leaf Plot Disadvantage of Frequency Table: An obvious disadvantage of using frequency table is that the identity of individual observation is lost in the grouping process. Stem and Leaf plot provides the solution by offering a quick and clear way of sorting and displaying data simultaneously.

Stem and Leaf Plot METHOD: � Sort the data series � Separate the sorted

Stem and Leaf Plot METHOD: � Sort the data series � Separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves) e. g. In 13, the leading digit (stem) is 1 and trailing digit (leaf) is 3 and in 21, the leading digit (stem) is 2 and trailing digit (leaf) is 1. � List all stems in a column from low to high � For each stem, list all associated leaves

Stem and Leaf Plot Example 1: Consider the temp data again. The sorted data

Stem and Leaf Plot Example 1: Consider the temp data again. The sorted data from low to high is shown below: 12, 13, 17, 21, 24, 26, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Here, use the 10’s digit for the stem unit: Stem Leaf 13 is shown as 1 3 21 is shown as 2 1 35 is shown as 3 5

Stem and Leaf Plot Data in ordered array: 12, 13, 17, 21, 24, 26,

Stem and Leaf Plot Data in ordered array: 12, 13, 17, 21, 24, 26, 27, 28, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Completed Stem-and-leaf diagram Stem Leaf 1 2 3 7 2 1 4 4 3 0 2 5 7 8 4 1 3 4 5 3 8 6 7 8 6

Review Let’s review the main concepts: Methods of Data Presentations � Classification of Data

Review Let’s review the main concepts: Methods of Data Presentations � Classification of Data � Bases of Classification � Types of Classifications � Tabulation of Data � Types of Tabulations � Constructing a Statistical Table � General Rules of Tabulation � Table of frequency distributions � Frequency Distribution � Relative frequency distribution � Cumulative frequency distribution

Next Lecture In next lecture, we will study: Graphical Methods of Data Presentations �Graphs

Next Lecture In next lecture, we will study: Graphical Methods of Data Presentations �Graphs for qualitative data �Bar Charts � Simple Bar Chart � Multiple Bar Chart � Component Bar Chart �Pie Charts �Graphs for quantitative data �Histograms �Frequency Polygon �Cumulative Frequency Polygon (Frequency Ogive)