Part 1 Data Presentation Statistics and Data Analysis

  • Slides: 28
Download presentation
Part 1 – Data Presentation Statistics and Data Analysis

Part 1 – Data Presentation Statistics and Data Analysis

Part 1 – Data Presentation Statistics and Data Analysis Part 1 – Data Presentation

Part 1 – Data Presentation Statistics and Data Analysis Part 1 – Data Presentation

1/29 Part 1 – Data Presentation Agenda ¢ Data and Data Types ¢ Representing

1/29 Part 1 – Data Presentation Agenda ¢ Data and Data Types ¢ Representing Data: pie chart, bar chart. ¢ Summarizing Data: box plot, histogram Central tendency l Spread l Distribution (shape) l 3

2/29 Part 1 – Data Presentation Data = A Set of Facts A picture

2/29 Part 1 – Data Presentation Data = A Set of Facts A picture of some aspect of the world Pizza Sales by Type What do the data tell you? How can you use the information? What additional information would make these data more informative? 4

3/29 Part 1 – Data Presentation A More Complicated Set of Facts: What story

3/29 Part 1 – Data Presentation A More Complicated Set of Facts: What story do the data tell? 5

4/29 Part 1 – Data Presentation Data Types and Measurement ¢ ¢ Univariate vs.

4/29 Part 1 – Data Presentation Data Types and Measurement ¢ ¢ Univariate vs. Multivariate Quantitative l Discrete = count: Number of shootings by city by time l Continuous = measurement: Housing prices Qualitative l Categorical: Shopping mall, car brand, trip mode l Ordinal: Survey data on attitudes; “How do you feel about…? ” Strongly disagree Disagree Neutral Agree Strongly agree Moody’s bond ratings: Aaa, A, Bbb, B, and so on. Frameworks l Cross section l Time series l Longitudinal 6

5/29 Part 1 – Data Presentation Univariate vs. Multivariate: Numerous Variables Univariate: Count of

5/29 Part 1 – Data Presentation Univariate vs. Multivariate: Numerous Variables Univariate: Count of pizzas is the single variable. 7

6/29 Part 1 – Data Presentation Discrete Data – US Crime Statistics; Counts of

6/29 Part 1 – Data Presentation Discrete Data – US Crime Statistics; Counts of Occurrences. 8

7/29 Part 1 – Data Presentation Continuous Data Housing Prices and Incomes 9

7/29 Part 1 – Data Presentation Continuous Data Housing Prices and Incomes 9

8/29 Part 1 – Data Presentation Unordered Qualitative Data Travel Mode by 210 Travelers*

8/29 Part 1 – Data Presentation Unordered Qualitative Data Travel Mode by 210 Travelers* * Note: Not computed with Minitab 10

9/29 Ordered Qualitative Data: Part 1 – Data Presentation German Health Satisfaction Survey; 5,

9/29 Ordered Qualitative Data: Part 1 – Data Presentation German Health Satisfaction Survey; 5, 831 Women. On a scale from 0 to 10, how do you feel about your health? * HEALTH SATISFACTION N = 5831 Response Frequency ========= 0 97 1 52 2 147 3 287 4 346 5 935 6 631 7 924 8 1329 9 626 10 457 11 * Note: Not computed with Minitab

10/29 Part 1 – Data Presentation Problems with Ordered Survey Response Data 61 Stern

10/29 Part 1 – Data Presentation Problems with Ordered Survey Response Data 61 Stern Students’ Ranking of Subway Safety (1994)* Safety Count Percent Cum Pct 1 17 27. 87 Very Unsatisfactory 2 15 24. 59 52. 46 Unsatisfactory 3 17 27. 87 80. 33 OK 4 10 16. 39 96. 72 Satisfactory 5 2 3. 28 100. 00 Jeff Simonoff: Data Presentation and Summary, pp. 3 -4 12 Very Satisfactory

11/29 Part 1 – Data Presentation Quantitative vs. Qualitative Data Quantitative Data: Units of

11/29 Part 1 – Data Presentation Quantitative vs. Qualitative Data Quantitative Data: Units of measurement make sense. Arithmetic computations make sense. 13 Qualitative Data: No units of measurement Arithmetic manipulation is usually meaningless. The average of Air and Bus is not Train

13/29 Part 1 – Data Presentation Cross Section Data Housing Prices and Incomes 14

13/29 Part 1 – Data Presentation Cross Section Data Housing Prices and Incomes 14

14/29 Part 1 – Data Presentation Time Series Data: Car Thefts 15

14/29 Part 1 – Data Presentation Time Series Data: Car Thefts 15

15/29 Part 1 – Data Presentation Longitudinal Data: 3 Year Survey: Satisfaction on a

15/29 Part 1 – Data Presentation Longitudinal Data: 3 Year Survey: Satisfaction on a scale from 0 to 5. 16

16/29 Part 1 – Data Presentation Representing Data ¢ In raw form ¢ Transformed

16/29 Part 1 – Data Presentation Representing Data ¢ In raw form ¢ Transformed to a visual form ¢ Summarized graphically ¢ Summarized statistically 17

17/29 Part 1 – Data Presentation Housing Prices and Incomes 18

17/29 Part 1 – Data Presentation Housing Prices and Incomes 18

18/29 Part 1 – Data Presentation Housing Price Data Visual Representation www. trulia. com/home_prices/

18/29 Part 1 – Data Presentation Housing Price Data Visual Representation www. trulia. com/home_prices/ 19

19/20 Part 1 – Data Presentation Pie Chart Pizza Pies Sold, by Type 20

19/20 Part 1 – Data Presentation Pie Chart Pizza Pies Sold, by Type 20

20/29 Part 1 – Data Presentation Data Representation BAR CHART PIE CHART Same data.

20/29 Part 1 – Data Presentation Data Representation BAR CHART PIE CHART Same data. Which is easier to understand? 21

21/29 Part 1 – Data Presentation A Box Plot Describes the Distribution of Values

21/29 Part 1 – Data Presentation A Box Plot Describes the Distribution of Values in a Set of Data What is an outlier? Why do we believe a particular point is an outlier? Hawaii Box and Whisker Plot for House Price Listings 22

22/29 Part 1 – Data Presentation Making a Box Plot for Per Capita Income

22/29 Part 1 – Data Presentation Making a Box Plot for Per Capita Income Maximum=31136 3 rd Quartile =24933 (approx. ) Median =22610 1 st Quartile =21677 (approx) Minimum=17043 23 Interquartile Range = IQR =24933 -21677 =3256

24/29 Part 1 – Data Presentation A Frequency Distribution 24

24/29 Part 1 – Data Presentation A Frequency Distribution 24

25/29 Part 1 – Data Presentation Histogram for House Price Listings A histogram describes

25/29 Part 1 – Data Presentation Histogram for House Price Listings A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings. HOG, pp. 16 -18 25

26/29 Part 1 – Data Presentation Distribution of House Price Listings Shows up in

26/29 Part 1 – Data Presentation Distribution of House Price Listings Shows up in the box and whisker plot. Note the long whisker at the top of the figure. Asymmetry (skewness) in the histogram of listing prices… 26

27/29 Part 1 – Data Presentation More than One Group in A Histogram* NF

27/29 Part 1 – Data Presentation More than One Group in A Histogram* NF = 14243 NM = 13083 * Note: Not computed with Minitab 27

29/29 Part 1 – Data Presentation Summary ¢ ¢ What story does the data

29/29 Part 1 – Data Presentation Summary ¢ ¢ What story does the data presentation tell? l Data in raw form tell no story. l Visual representation of data tells something about the data Data reduction and summary representation: What do we learn? l l l ¢ Location Spread Shape of the distribution What tool is most informative? l l Reduction to a small number of features Visual displays of data • Pie chart • Box and whisker plots • Histograms • Time series plots “There are lies, damned lies and statistics. ” (Benjamin Disraeli) 28