Part 1 Data Presentation Statistics and Data Analysis




























- Slides: 28
Part 1 – Data Presentation Statistics and Data Analysis
Part 1 – Data Presentation Statistics and Data Analysis Part 1 – Data Presentation
1/29 Part 1 – Data Presentation Agenda ¢ Data and Data Types ¢ Representing Data: pie chart, bar chart. ¢ Summarizing Data: box plot, histogram Central tendency l Spread l Distribution (shape) l 3
2/29 Part 1 – Data Presentation Data = A Set of Facts A picture of some aspect of the world Pizza Sales by Type What do the data tell you? How can you use the information? What additional information would make these data more informative? 4
3/29 Part 1 – Data Presentation A More Complicated Set of Facts: What story do the data tell? 5
4/29 Part 1 – Data Presentation Data Types and Measurement ¢ ¢ Univariate vs. Multivariate Quantitative l Discrete = count: Number of shootings by city by time l Continuous = measurement: Housing prices Qualitative l Categorical: Shopping mall, car brand, trip mode l Ordinal: Survey data on attitudes; “How do you feel about…? ” Strongly disagree Disagree Neutral Agree Strongly agree Moody’s bond ratings: Aaa, A, Bbb, B, and so on. Frameworks l Cross section l Time series l Longitudinal 6
5/29 Part 1 – Data Presentation Univariate vs. Multivariate: Numerous Variables Univariate: Count of pizzas is the single variable. 7
6/29 Part 1 – Data Presentation Discrete Data – US Crime Statistics; Counts of Occurrences. 8
7/29 Part 1 – Data Presentation Continuous Data Housing Prices and Incomes 9
8/29 Part 1 – Data Presentation Unordered Qualitative Data Travel Mode by 210 Travelers* * Note: Not computed with Minitab 10
9/29 Ordered Qualitative Data: Part 1 – Data Presentation German Health Satisfaction Survey; 5, 831 Women. On a scale from 0 to 10, how do you feel about your health? * HEALTH SATISFACTION N = 5831 Response Frequency ========= 0 97 1 52 2 147 3 287 4 346 5 935 6 631 7 924 8 1329 9 626 10 457 11 * Note: Not computed with Minitab
10/29 Part 1 – Data Presentation Problems with Ordered Survey Response Data 61 Stern Students’ Ranking of Subway Safety (1994)* Safety Count Percent Cum Pct 1 17 27. 87 Very Unsatisfactory 2 15 24. 59 52. 46 Unsatisfactory 3 17 27. 87 80. 33 OK 4 10 16. 39 96. 72 Satisfactory 5 2 3. 28 100. 00 Jeff Simonoff: Data Presentation and Summary, pp. 3 -4 12 Very Satisfactory
11/29 Part 1 – Data Presentation Quantitative vs. Qualitative Data Quantitative Data: Units of measurement make sense. Arithmetic computations make sense. 13 Qualitative Data: No units of measurement Arithmetic manipulation is usually meaningless. The average of Air and Bus is not Train
13/29 Part 1 – Data Presentation Cross Section Data Housing Prices and Incomes 14
14/29 Part 1 – Data Presentation Time Series Data: Car Thefts 15
15/29 Part 1 – Data Presentation Longitudinal Data: 3 Year Survey: Satisfaction on a scale from 0 to 5. 16
16/29 Part 1 – Data Presentation Representing Data ¢ In raw form ¢ Transformed to a visual form ¢ Summarized graphically ¢ Summarized statistically 17
17/29 Part 1 – Data Presentation Housing Prices and Incomes 18
18/29 Part 1 – Data Presentation Housing Price Data Visual Representation www. trulia. com/home_prices/ 19
19/20 Part 1 – Data Presentation Pie Chart Pizza Pies Sold, by Type 20
20/29 Part 1 – Data Presentation Data Representation BAR CHART PIE CHART Same data. Which is easier to understand? 21
21/29 Part 1 – Data Presentation A Box Plot Describes the Distribution of Values in a Set of Data What is an outlier? Why do we believe a particular point is an outlier? Hawaii Box and Whisker Plot for House Price Listings 22
22/29 Part 1 – Data Presentation Making a Box Plot for Per Capita Income Maximum=31136 3 rd Quartile =24933 (approx. ) Median =22610 1 st Quartile =21677 (approx) Minimum=17043 23 Interquartile Range = IQR =24933 -21677 =3256
24/29 Part 1 – Data Presentation A Frequency Distribution 24
25/29 Part 1 – Data Presentation Histogram for House Price Listings A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings. HOG, pp. 16 -18 25
26/29 Part 1 – Data Presentation Distribution of House Price Listings Shows up in the box and whisker plot. Note the long whisker at the top of the figure. Asymmetry (skewness) in the histogram of listing prices… 26
27/29 Part 1 – Data Presentation More than One Group in A Histogram* NF = 14243 NM = 13083 * Note: Not computed with Minitab 27
29/29 Part 1 – Data Presentation Summary ¢ ¢ What story does the data presentation tell? l Data in raw form tell no story. l Visual representation of data tells something about the data Data reduction and summary representation: What do we learn? l l l ¢ Location Spread Shape of the distribution What tool is most informative? l l Reduction to a small number of features Visual displays of data • Pie chart • Box and whisker plots • Histograms • Time series plots “There are lies, damned lies and statistics. ” (Benjamin Disraeli) 28