Chapter Two Graphical and Tabular Descriptive Techniques 2007
Chapter Two Graphical and Tabular Descriptive Techniques 2007會計資訊系統計學(一)上課投影片 2. 1
Introduction & Re-cap Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information is produced to enable meaningful interpretation, and to support decision making. Statistics Data Information Its methods make use of graphical techniques and numerical descriptive measures (such as averages) to summarize and present the data. 2007會計資訊系統計學(一)上課投影片 2
Populations & Samples Population Sample Subset The graphical & tabular methods presented here apply to both entire populations and samples drawn from populations. 2007會計資訊系統計學(一)上課投影片 3
Definitions A variable(變數) is some characteristic of a population or sample that is of interest for us. E. g. student grades. Typically denoted with a capital letter: X, Y, Z… The values of the variable are the range of possible values for a variable. E. g. student marks (0. . 100) Data are the observed values(觀測值) of a variable. E. g. student marks: {67, 74, 71, 83, 93, 55, 48} 2007會計資訊系統計學(一)上課投影片 4
2. 1 Types of Data & Information Data (at least for purposes of Statistics) fall into three main groups: Interval Data Nominal Data Ordinal Data Person Marital status Age - income 55 42 75000 68000 . . . Weight. gain 1 2 3 . . married single Computer . Brand. 1 IBM +10 2 Dell +5 3 IBM. . Knowing the type of data is necessary to properly select . . the technique to be used when analyzing data. 2007會計資訊系統計學(一)上課投影片 5
Interval Data(區間尺度資料) Interval data • Real numbers, i. e. heights, weights, prices, etc. • Also referred to as quantitative or numerical. Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on. 2007會計資訊系統計學(一)上課投影片 6
Nominal Data(名目尺度資料) Nominal Data • The values of nominal data are categories. E. g. responses to questions about marital status, coded as: Single = 1, Married = 2, Divorced = 3, Widowed = 4 Because the numbers are arbitrary arithmetic operations don’t make any sense (e. g. does Widowed ÷ 2 = Married? !) Nominal data are also called qualitative or categorical. 2007會計資訊系統計學(一)上課投影片 7
Ordinal Data(順序尺度資料) Ordinal Data appear to be categorical in nature, but their values have an order; a ranking to them: E. g. College course rating system: poor = 1, fair = 2, good = 3, very good = 4, excellent = 5 While its still not meaningful to do arithmetic on this data (e. g. does 2*fair = very good? !), we can say things like: excellent > poor fair < very good That is, order is maintained no matter what numeric values are assigned to each category. 2007會計資訊系統計學(一)上課投影片 8
Types of Data & Information Data Categorical? N Interval Data Y Ordered? Categorical Data Y Ordinal Data N Nominal Data 2007會計資訊系統計學(一)上課投影片 9
E. g. Representing Student Grades Data Categorical? N Interval Data e. g. {0. . 100} Y Ordinal Data e. g. {F, D, C, B, A} Y Ordered? Categorical Data N Rank order to data Nominal Data e. g. {Pass | Fail} NO rank order to data 2007會計資訊系統計學(一)上課投影片 10
Calculations for Types of Data As mentioned above, • All calculations are permitted on interval data. • Only calculations involving a ranking process are allowed for ordinal data. • No calculations are allowed for nominal data, save counting the number of observations in each category. This lends itself to the following “hierarchy of data”. 2007會計資訊系統計學(一)上課投影片 11
Hierarchy of Data Interval Values are real numbers. All calculations are valid. Data may be treated as ordinal or nominal. Ordinal Values must represent the ranked order of the data. Calculations based on an ordering process are valid. Data may be treated as nominal but not as interval. Nominal Values are the arbitrary numbers that represent categories. Only calculations based on the frequencies of occurrence are valid. Data may not be treated as ordinal or interval. 2007會計資訊系統計學(一)上課投影片 12
2. 2 Graphical & Tabular Techniques for Nominal Data The only allowable calculation on nominal data is to count the frequency of each value of the variable. We can summarize the data in a table that presents the categories and their counts called a frequency distribution( 次數分配). A relative frequency distribution(相對次數分配) lists the categories and the proportion with which each occurs. Refer to Example 2. 1 2007會計資訊系統計學(一)上課投影片 13
Nominal Data (Tabular Summary) Table 2. 1 Frequency and Relative Frequency Distributions for Example 2. 1 2007會計資訊系統計學(一)上課投影片 14
Figure 2. 1 Bar Chart for Example 2. 1 2007會計資訊系統計學(一)上課投影片 15
Figure 2. 2 Pie Chart for Example 2. 1 2007會計資訊系統計學(一)上課投影片 16
Nominal Data-Bar Chart(長條圖) Rectangles represent each category. The height of the rectangle represents the frequency. The base of the rectangle is arbitrary 2007會計資訊系統計學(一)上課投影片 17
Nominal Data-Pie Chart(圓餅圖) The pie chart is a circle, subdivided into a number of slices that represent the various categories. The size of each slice is proportional to the percentage corresponding to the category it represents. Pie Charts show relative frequencies. 2007會計資訊系統計學(一)上課投影片 18
Table 2. 2 Proportion in Each Category in Example 2. 1 2007會計資訊系統計學(一)上課投影片 19
Nominal Data It all the same information, (based on the same data). Just different presentation. 2007會計資訊系統計學(一)上課投影片 20
Table 2. 3 Daily Oil Production 2007會計資訊系統計學(一)上課投影片 21
Figure 2. 3 Bar Chart for Example 2. 2 2007會計資訊系統計學(一)上課投影片 22
Table 2. 4 Percent Share of Television Viewers 2007會計資訊系統計學(一)上課投影片 23
Figure 2. 4 Pie Chart for Example 2. 3 2007會計資訊系統計學(一)上課投影片 24
2. 3 Graphical Techniques for Interval Data There are several graphical methods that are used when the data are interval (i. e. numeric, non-categorical). The most important of these graphical methods is the histogram(直方圖). The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities. 2007會計資訊系統計學(一)上課投影片 25
Building a Histogram 1) Collect the Data Collect data Example 2. 4: Providing information concerning the monthly bills of new subscribers in the first month after signing on with a telephone company. (There are 200 data points 2007會計資訊系統計學(一)上課投影片 26
Building a Histogram 1) Collect the Data 2) Create a frequency distribution for the data… How? a) Determine the number of classes(組數) to use… How? Refer to Table 2. 6: With 200 observations, we should have between 7 & 10 classes… Alternative, we could use Sturges’ formula: Number of class intervals = 1 + 3. 3 log (n) 2007會計資訊系統計學(一)上課投影片 27
Building a Histogram 1) Collect the Data 2) Create a frequency distribution for the data… How? a) Determine the number of classes to use. [8] b) Determine how large to make each class… How? Look at the range(全距) of the data, that is, Range = Largest Observation – Smallest Observation Range = $119. 63 – $0 = $119. 63 Then each class width(組寬) becomes: Range ÷ (# classes) = 119. 63 ÷ 8 ≈ 15 2007會計資訊系統計學(一)上課投影片 28
Class width It is generally best to use equal class width, but sometimes unequal class width are called for. Unequal class width is used when the frequency associated with some classes is too low. Then, several classes are combined together to form a wider and “more populated” class. It is possible to form an open ended class at the higher end or lower end of the histogram. 2007會計資訊系統計學(一)上課投影片 29
Building a Histogram 1) Collect the Data 2) Create a frequency distribution for the data… How? a) Determine the number of classes to use. [8] b) Determine how large to make each class. [15] c) Place the data into each class… Ø each item can only belong to one class; Ø classes contain observations greater than or equal to their lower limits and less than their upper limits. i. e. lower limit 2007會計資訊系統計學(一)上課投影片 ≦ x < upper limit 30
Building a Histogram 1) Collect the Data 2) Create a frequency distribution for the data. 3) Draw the Histogram… 2007會計資訊系統計學(一)上課投影片 31
Building a Histogram 1) Collect the Data 2) Create a frequency distribution for the data. 3) Draw the Histogram. 2007會計資訊系統計學(一)上課投影片 32
Interpret about half (71+37=108) of the bills are “small”, i. e. less than $30 (18+28+14=60)÷ 200 = 30% i. e. nearly a third of the phone bills are $90 or more. There are only a few telephone bills in the middle range. 2007會計資訊系統計學(一)上課投影片 33
Relative frequency(相對次數) It is often preferable to show the relative frequency (proportion) of observations falling into each class, rather than the frequency itself. Class frequency Class relative frequency = Total number of observations Relative frequencies should be used when – the population relative frequencies are studied – comparing two or more histograms – the number of observations of the samples studied are different 2007會計資訊系統計學(一)上課投影片 34
Shapes of Histograms Variable 2007會計資訊系統計學(一)上課投影片 Frequency Symmetry(對稱性) A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size: Variable 35
Shapes of Histograms Frequency Skewness(偏態) A skewed histogram is one with a long tail extending to either the right or the left: Variable Positively Skewed (正、右偏) 2007會計資訊系統計學(一)上課投影片 Variable Negatively Skewed (負、左偏) 36
Shapes of Histograms Modality(眾數組個數) A unimodal histogram is one with a single peak, while a bimodal histogram is one with two peaks: Unimodal(單峰) Frequency Bimodal(雙峰) Variable A modal class is the class with the largest number of observations 2007會計資訊系統計學(一)上課投影片 37
Shapes of Histograms Many statistical techniques require that the population be bell shaped. Drawing the histogram helps verify the shape of the population in question. 2007會計資訊系統計學(一)上課投影片 Frequency Bell Shape(鐘形分配) A special type of symmetric unimodal histogram is one that is bell shaped: Variable Bell Shaped 38
Histogram Comparison Example 2. 6 & Example 2. 7 Comparing students’ performance. The two classes differed in their teaching emphasis Class A – mathematical analysis and development of theory. Class B – applications and computer based analysis. The final mark for each student in each course was recorded. Draw histograms and interpret the results. 2007會計資訊系統計學(一)上課投影片 39
Histogram Comparison Compare & contrast the following histograms based on data from Example 2. 6 & Example 2. 7. unimodal vs. bimodal The two courses have very different histograms… spread of the marks (narrower | wider) 2007會計資訊系統計學(一)上課投影片 40
Stem & Leaf Display(枝(莖)葉圖) Retains information about individual observations that would normally be lost in the creation of a histogram. Split each observation into two parts, a stem and a leaf: e. g. Observation value: 42. 19 There are several ways to split it up… We could split it at the decimal point: Stem 42 Leaf 19 4 2 Or split it at the “tens” position (while rounding to the nearest integer in the “ones” position) 2007會計資訊系統計學(一)上課投影片 41
Stem & Leaf Display Continue this process for all the observations. Then, use the “stems” for the classes and each leaf becomes part of the histogram (based on Example 2. 4 data) as follows… Stem Leaf 0 1 2 3 4 5 6 7 8 9 10 11 00000111112222223333345555556666666778888999999 000001111233333334455555667889999 0000111112344666778999 001335589 124445589 33566 3458 022224556789 Thus, we still have access to our 334457889999 original data point’s value! 00112222233344555999 001344446699 124557889 2007會計資訊系統計學(一)上課投影片 42
Histogram and Stem & Leaf 2007會計資訊系統計學(一)上課投影片 43
Ogive(肩形圖) (pronounced “Oh-jive”) is a graph of a cumulative frequency distribution(累積次數分配). We create an ogive in three steps… First, from the frequency distribution created earlier, calculate relative frequencies(相對次數): Relative Frequency = # of observations in a class Total # of observations 2007會計資訊系統計學(一)上課投影片 44
Relative Frequencies For example, we had 71 observations in our first class (telephone bills from $0. 00 to $15. 00). Thus, the relative frequency for this class is 71 ÷ 200 (the total # of phone bills) = 0. 355 (or 35. 5%) 2007會計資訊系統計學(一)上課投影片 45
Ogive Is a graph of a cumulative frequency distribution. We create an ogive in three steps… 1) Calculate relative frequencies. 2) Calculate cumulative relative frequencies(累積相對 次數) by adding the current class’ relative frequency to the previous class’ cumulative relative frequency. (For the first class, its cumulative relative frequency is just its relative frequency) 2007會計資訊系統計學(一)上課投影片 46
Cumulative Relative Frequencies first class… next class: . 355+. 185=. 540 : : last class: . 930+. 070=1. 00 2007會計資訊系統計學(一)上課投影片 47
Ogive Is a graph of a cumulative frequency distribution. 1) Calculate relative frequencies. 2) Calculate cumulative relative frequencies. 3) Graph the cumulative relative frequencies… 2007會計資訊系統計學(一)上課投影片 48
Ogive The ogive can be used to answer questions like: What telephone bill value is at the 50 th percentile? “around $35” 2007會計資訊系統計學(一)上課投影片 49
2. 4 Two Nominal Variables So far we’ve looked at tabular and graphical techniques for one variable (either nominal or interval data). A contingency table(列聯表) (also called a crossclassification table or cross-tabulation table(交叉分類表) ) is used to describe the relationship between two nominal variables. A contingency table lists the frequency of each combination of the values of the two variables. 2007會計資訊系統計學(一)上課投影片 50
Contingency Table In Example 2. 8, a sample of newspaper readers was asked to report which newspaper they read: Globe and Mail (1), Post (2), Star (3), or Sun (4), and to indicate whether they were blue-collar worker (1), white-collar worker (2), or professional (3). This reader’s response is captured as part of the total number on the contingency table… 2007會計資訊系統計學(一)上課投影片 51
Contingency Table Interpretation: The relative frequencies in the columns 2 & 3 are similar, but there are large differences between columns 1 and 2 and between columns 1 and 3. similar dissimilar This tells us that blue collar workers tend to read different newspapers from both white collar workers and professionals and that white collar and professionals are quite similar in their newspaper choice. 2007會計資訊系統計學(一)上課投影片 52
Graphing the Relationship Between Two Nominal Variables Use the data from the contingency table to create bar charts… Professionals tend to read the Globe & Mail more than twice as often as the Star or Sun… 2007會計資訊系統計學(一)上課投影片 53
Graphing the Relationship Between Two Interval Variables Moving from nominal data to interval data, we are frequently interested in how two interval variables are related. To explore this relationship, we employ a scatter diagram (散佈圖), which plots two variables against one another. The independent variable(自變數) is labeled X and is usually placed on the horizontal axis, while the other, dependent variable(因變數), Y, is mapped to the vertical axis. 2007會計資訊系統計學(一)上課投影片 54
Scatter Diagram Example 2. 9 A real estate agent wanted to know to what extent the selling price of a home is related to its size… 1) Collect the data 2) Determine the independent variable (X – house size) and the dependent variable (Y – selling price) 3) Use Excel to create a “scatter diagram”… 2007會計資訊系統計學(一)上課投影片 55
Scatter Diagram It appears that in fact there is a relationship, that is, the greater the house size the greater the selling price… 2007會計資訊系統計學(一)上課投影片 56
Patterns of Scatter Diagrams Linearity(線性相關) and Direction are two concepts we are interested in. Positive Linear Relationship Negative Linear Relationship Weak or Non-Linear Relationship 2007會計資訊系統計學(一)上課投影片 57
2. 5 Time Series Data Observations measured at the same point in time are called cross-sectional data(橫斷面資料). Marketing survey (observe preferences by gender, age) Test score in a statistics course Starting salaries of an MBA program graduates Observations measured at successive points in time are called time-series data(時間序列資料). Weekly closing price of gold Amount of crude oil imported monthly Time-series data graphed on a line chart(線圖), which plots the value of the variable on the vertical axis against the time periods on the horizontal axis. 2007會計資訊系統計學(一)上課投影片 58
Line Chart From Example 2. 10, plot the total amounts of U. S. income tax for the years 1987 to 2002. 2007會計資訊系統計學(一)上課投影片 59
Line Chart From ’ 87 to ’ 92, the tax was fairly flat. Starting ’ 93, there was a rapid increase taxes until 2001. Finally, there was a downturn in 2002. 2007會計資訊系統計學(一)上課投影片 60
Summary I Factors That Identify When to Use Frequency and Relative Frequency Tables, Bar and Pie Charts 1. Objective: Describe a single set of data. 2. Data type: Nominal Factors That Identify When to Use a Histogram, Ogive, or Stem-and-Leaf Display 1. Objective: Describe a single set of data. 2. Data type: Interval Factors that Identify When to Use a Contingency Table 1. Objective: Describe the relationship between two variables. 2. Data type: Nominal Factors that Identify When to Use a Scatter Diagram 1. Objective: Describe the relationship between two variables. 2. Data type: Interval 2007會計資訊系統計學(一)上課投影片 61
Summary II Interval Data Histogram, Single Set of Ogive, Stem-and-Leaf Display Data Relationship Between Two Variables 2007會計資訊系統計學(一)上課投影片 Scatter Diagram Nominal Data Frequency and Relative Frequency Tables, Bar and Pie Charts Contingency Table, Bar Charts 62
- Slides: 62