Chapter 2 Graphical Descriptive Techniques 1 2 1

  • Slides: 45
Download presentation
Chapter 2 Graphical Descriptive Techniques 1

Chapter 2 Graphical Descriptive Techniques 1

2. 1 Introduction Descriptive statistics involves the arrangement, summary, and presentation of data, to

2. 1 Introduction Descriptive statistics involves the arrangement, summary, and presentation of data, to enable meaningful interpretation, and to support decision making. Descriptive statistics methods make use of n graphical techniques n numerical descriptive measures. The methods presented apply to both n the entire population n the population sample 2

2. 2 Types of data and information A variable - a characteristic of population

2. 2 Types of data and information A variable - a characteristic of population or sample that is of interest for us. n n n Cereal choice Capital expenditure The waiting time for medical services Data - the actual values of variables n n n Interval data are numerical observations Nominal data are categorical observations Ordinal data are ordered categorical observations 3

Types of data - examples Interval data Age - income 55 42 75000 68000

Types of data - examples Interval data Age - income 55 42 75000 68000 . . Weight gain +10 +5 . . Nominal Person Marital status 1 2 3 married single . . Computer 1 2 3. . IBM Dell IBM. . Brand 4

Types of data - examples Interval data Nominal data With nominal data, all we

Types of data - examples Interval data Nominal data With nominal data, all we can do is, calculate the proportion of data that falls into each category. Age - income 55 42 . . 75000 68000 . . Weight gain +10 +5 . . IBM 25 50% Dell Compaq 11 8 22% 16% Other 6 12% 5 T

Types of data – analysis Knowing the type of data is necessary to properly

Types of data – analysis Knowing the type of data is necessary to properly select the technique to be used when analyzing data. Type of analysis allowed for each type of data n n n Interval data – arithmetic calculations Nominal data – counting the number of observation in each category Ordinal data - computations based on an ordering process 6

Cross-Sectional/Time-Series Data Cross sectional data is collected at a certain point in time n

Cross-Sectional/Time-Series Data Cross sectional data is collected at a certain point in time n n n Marketing survey (observe preferences by gender, age) Test score in a statistics course Starting salaries of an MBA program graduates Time series data is collected over successive points in time n n Weekly closing price of gold Amount of crude oil imported monthly 7

2. 3 Graphical Techniques for Interval Data Example 2. 1: Providing information concerning the

2. 3 Graphical Techniques for Interval Data Example 2. 1: Providing information concerning the monthly bills of new subscribers in the first month after signing on with a telephone company. n n n Collect data Prepare a frequency distribution Draw a histogram 8

Example 2. 1: Providing information Collect data Prepare a frequency distribution How many classes

Example 2. 1: Providing information Collect data Prepare a frequency distribution How many classes to use? Number of observations Number of class Less then 50 5 -7 50 - 200 7 -9 200 - 500 9 -10 500 - 1, 000 10 -11 1, 000 – 5, 000 11 -13 5, 000 - 50, 000 13 -17 More than 50, 000 17 -20 Class width = [Range] / [# of classes] [119. 63 - 0] / [8] = 14. 95 (There are 200 data points Largest Smallest Largest observation observation on 15 9

Example 2. 1: Providing information Draw a Histogram 10

Example 2. 1: Providing information Draw a Histogram 10

Example 2. 1: Providing information What information can we extract from this histogram 60

Example 2. 1: Providing information What information can we extract from this histogram 60 40 Bills 120 105 90 75 60 45 0 30 20 15 Frequency Relatively, About half of all A few bills are in large number the bills are small the middle range of large bills 80 71+37=108 13+9+10=32 18+28+14=60 11

Relative frequency It is often preferable to show the relative frequency (proportion) of observations

Relative frequency It is often preferable to show the relative frequency (proportion) of observations falling into each class, rather than the frequency itself. Class frequency Class relative frequency = Total number of observations Relative frequencies should be used when n the population relative frequencies are studied comparing two or more histograms the number of observations of the samples studied are different 12

Class width It is generally best to use equal class width, but sometimes unequal

Class width It is generally best to use equal class width, but sometimes unequal class width are called for. Unequal class width is used when the frequency associated with some classes is too low. Then, n n several classes are combined together to form a wider and “more populated” class. It is possible to form an open ended class at the higher end or lower end of the histogram. 13

Shapes of histograms There are four typical shape characteristics 14

Shapes of histograms There are four typical shape characteristics 14

Shapes of histograms Negatively skewed Positively skewed 15

Shapes of histograms Negatively skewed Positively skewed 15

Modal classes A modal class is the one with the largest number of observations.

Modal classes A modal class is the one with the largest number of observations. A unimodal histogram The modal class 16

Modal classes A bimodal histogram A modal class 17

Modal classes A bimodal histogram A modal class 17

Bell shaped histograms • Many statistical techniques require that the population be bell shaped.

Bell shaped histograms • Many statistical techniques require that the population be bell shaped. • Drawing the histogram helps verify the shape of the population in question 18

Interpreting histograms Example 2. 2: Selecting an investment n n n An investor is

Interpreting histograms Example 2. 2: Selecting an investment n n n An investor is considering investing in one out of two investments. The returns on these investments were recorded. From the two histograms, how can the investor interpret the w Expected returns w The spread of the return (the risk involved with each investment) 19

Example 2. 2 - Histograms 181614121086420 - The center for A -15 0 15

Example 2. 2 - Histograms 181614121086420 - The center for A -15 0 15 30 45 60 75 Return on investment A 181614121086420 -15 The center for B 0 15 30 45 60 75 Return on investment B Interpretation: The center of the returns of Investment A is slightly lower than that for Investment B 20

Example 2. 2 - Histograms 181614121086420 - Sample size =50 17 34 46 -15

Example 2. 2 - Histograms 181614121086420 - Sample size =50 17 34 46 -15 0 15 30 45 60 75 Return on investment A Sample size =50 1816141210816 626 443 20 -15 0 15 30 45 60 75 Return on investment B Interpretation: The spread of returns for Investment A is less than that for investment B 21

Example 2. 2 - Histograms 181614121086420 - -15 0 15 30 45 60 75

Example 2. 2 - Histograms 181614121086420 - -15 0 15 30 45 60 75 Return on investment A 181614121086420 -15 0 15 30 45 60 75 Return on investment B Interpretation: Both histograms are slightly positively skewed. There is a possibility of large returns. 22

Providing information Example 2. 2: Conclusion n It seems that investment A is better,

Providing information Example 2. 2: Conclusion n It seems that investment A is better, because: w Its expected return is only slightly below that of investment B w The risk from investing in A is smaller. w The possibility of having a high rate of return exists for both investment. 23

Interpreting histograms Example 2. 3: Comparing students’ performance n n Students’ performance in two

Interpreting histograms Example 2. 3: Comparing students’ performance n n Students’ performance in two statistics classes were compared. The two classes differed in their teaching emphasis w Class A – mathematical analysis and development of theory. w Class B – applications and computer based analysis. n n The final mark for each student in each course was recorded. Draw histograms and interpret the results. 24

Interpreting histograms The mathematical emphasis creates two groups, and a larger spread. 25

Interpreting histograms The mathematical emphasis creates two groups, and a larger spread. 25

Stem and Leaf Display This is a graphical technique most often used in a

Stem and Leaf Display This is a graphical technique most often used in a preliminary analysis. Stem and leaf diagrams use the actual value of the original observations (whereas, the histogram does not). 26

Stem and Leaf Display Split each observation into two parts. There are several ways

Stem and Leaf Display Split each observation into two parts. There are several ways of doing that: 42. 19 Stem 42 Observation: 42. 19 Leaf 19 Stem 4 Leaf 2 A stem and leaf display for Example 2. 1 will use this method next. 27

Stem and Leaf Display A stem and leaf display for Example 2. 1 Stem

Stem and Leaf Display A stem and leaf display for Example 2. 1 Stem Leaf 0 00000111112222223333345555556666666778888999999 1 000001111233333334455555667889999 2 0000111112344666778999 3 001335589 The length of each line 4 124445589 represents the frequency 5 33566 6 3458 of the class defined by 7 022224556789 the stem. 8 334457889999 9 00112222233344555999 10 001344446699 11 124557889 28

Ogives are cumulative relative frequency distributions. Example 2. 1 - continued }} . 700.

Ogives are cumulative relative frequency distributions. Example 2. 1 - continued }} . 700. 650. 605. 540 . 790 . 930 1. 000 . 355 15 30 45 60 75 90 105 120 29

2. 4 Graphical Techniques for Nominal data The only allowable calculation on nominal data

2. 4 Graphical Techniques for Nominal data The only allowable calculation on nominal data is to count the frequency of each value of a variable. When the raw data can be naturally categorized in a meaningful manner, we can display frequencies by n n Bar charts – emphasize frequency of occurrences of the different categories. Pie chart – emphasize the proportion of occurrences of each category. 30

The Pie Chart The pie chart is a circle, subdivided into a number of

The Pie Chart The pie chart is a circle, subdivided into a number of slices that represent the various categories. The size of each slice is proportional to the percentage corresponding to the category it represents. 31

The Pie Chart Example 2. 4 n n n The student placement office at

The Pie Chart Example 2. 4 n n n The student placement office at a university wanted to determine the general areas of employment of last year school graduates. Data was collected, and the count of the occurrences was recorded for each area. These counts were converted to proportions and the results were presented as a pie chart and a bar chart. 32

The Pie Chart Other 11. 1% Accounting 28. 9% (28. 9 /100)(3600) = 104

The Pie Chart Other 11. 1% Accounting 28. 9% (28. 9 /100)(3600) = 104 General management 14. 2% Finance 20. 6% Marketing 25. 3% 33

The Bar Chart Rectangles represent each category. The height of the rectangle represents the

The Bar Chart Rectangles represent each category. The height of the rectangle represents the frequency. The base of the rectangle is arbitrary 73 64 52 36 28 34

The Bar Chart Use bar charts also when the order in which nominal data

The Bar Chart Use bar charts also when the order in which nominal data are presented is meaningful. Total number of new products introduced in North America in the years 1989, …, 1994 20, 000 15, 000 10, 000 5, 000 0 ‘ 89 ‘ 90 ‘ 91 ‘ 92 ‘ 93 ‘ 94 35

2. 5 Describing the Relationship Between Two Variables We are interested in the relationship

2. 5 Describing the Relationship Between Two Variables We are interested in the relationship between two interval variables. Example 2. 7 n n n A real estate agent wants to study the relationship between house price and house size Twelve houses recently sold are sampled and Size Price there size and price recorded 23 315 Use graphical technique to describe the 18 229 relationship between size and price. 26 335 20 21 22 261 ……………. 36

2. 5 Describing the Relationship Between Two Variables Solution n n The size (independent

2. 5 Describing the Relationship Between Two Variables Solution n n The size (independent variable, X) affects the price (dependent variable, Y) We use Excel to create a scatter diagram Y se u o e h ice h t er e pr t a re er th g The great the , size X 37

Typical Patterns of Scatter Diagrams Positive linear relationship No relationship Negative linear relations Negative

Typical Patterns of Scatter Diagrams Positive linear relationship No relationship Negative linear relations Negative nonlinear relationship Nonlinear (concave) relationshi This is a weak linear relationship. A non linear relationship seems to fit the data better. 38

Graphing the Relationship Between Two Nominal Variables We create a contingency table. This table

Graphing the Relationship Between Two Nominal Variables We create a contingency table. This table lists the frequency for each combination of values of the two variables. We can create a bar chart that represent the frequency of occurrence of each combination of values. 39

Contingency table Example 2. 8 n To conduct an efficient advertisement campaign the relationship

Contingency table Example 2. 8 n To conduct an efficient advertisement campaign the relationship between occupation and newspapers readership is studied. The following table was created (To see the data click Xm 02 -08 a) 40

Contingency table Solution If there is no relationship between occupation and newspaper read, the

Contingency table Solution If there is no relationship between occupation and newspaper read, the bar charts describing the frequency of readership of newspapers should look similar across occupations. 41

Bar charts for a contingency table Blue-collar workers prefer the “Star” and the “Sun”.

Bar charts for a contingency table Blue-collar workers prefer the “Star” and the “Sun”. White-collar workers and professionals mostly read the “Post” and the “Globe and Mail” 42

2. 6 Describing Time-Series Data can be classified according to the time it is

2. 6 Describing Time-Series Data can be classified according to the time it is collected. n n Cross-sectional data are all collected at the same time. Time-series data are collected at successive points in time. Time-series data is often depicted on a line chart (a plot of the variable over time). 43

Line Chart Example 2. 9 n n The total amount of income tax paid

Line Chart Example 2. 9 n n The total amount of income tax paid by individuals in 1987 through 1999 are listed below. Draw a graph of this data and describe the information produced 44

Line Chart For the first five years – total tax was relatively flat From

Line Chart For the first five years – total tax was relatively flat From 1993 there was a rapid increase in tax revenues. Line charts can be used to describe nominal data time series. 45