Statistics and Data Analysis Professor William Greene Stern

  • Slides: 39
Download presentation
Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department of

Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department of Economics /39 1. Data Presentation

Statistics and Data Analysis Part 1 – Data Presentation Telling the story statistically /39

Statistics and Data Analysis Part 1 – Data Presentation Telling the story statistically /39 1. Data Presentation

Samples are surprisingly small > 1010 Observations > Telephone sample > Sampling error 3/39

Samples are surprisingly small > 1010 Observations > Telephone sample > Sampling error 3/39 1. Data Presentation

What Does it Mean? Slightly more than one-third of Americans have a favorable opinion

What Does it Mean? Slightly more than one-third of Americans have a favorable opinion of the Democratic-led Congress, a poll said Wednesday. The Pew Research Center for the People & the Press said the 37% expressing a positive opinion represents a decline of 13 points since April. The favorable percentage is one of the lowest in more than two decades of Pew surveys – if not the lowest, the poll said. The previous low was 40% in January, but the result is not statistically significant because of the margin of error. (USA Today) We will develop the idea of the “margin of error” and how it is computed. 4/39 1. Data Presentation

Really? The following was taken from http: //www. msnbc. msn. com/id/27339545/ An msnbc. com

Really? The following was taken from http: //www. msnbc. msn. com/id/27339545/ An msnbc. com guide to presidential polls Why results, samples and methodology vary from survey to survey WASHINGTON - A poll is a small sample of some larger number, an estimate of something about that larger number. For instance, what percentage of people reports that they will cast their ballots for a particular candidate in an election? A sample reflects the larger number from which it is drawn. Let’s say you had a perfectly mixed barrel of 1, 000 tennis balls, of which 700 are white and 300 orange. You do your sample by scooping up just 50 of those tennis balls. If your barrel was perfectly mixed, you wouldn’t need to count all 1, 000 tennis balls — your sample would tell you that 30 percent of the balls were orange. Your sample might tell you that approximately 30 percent of the balls were orange. 5/39 1. Data Presentation

The Visual Data Do Tell the Story: Napoleon’s March to and from Moscow 6/39

The Visual Data Do Tell the Story: Napoleon’s March to and from Moscow 6/39 1. Data Presentation

Informative Data Table Life Expectancy: Highest 15 Countries, 2010 Disability Adjusted Life Expectancy 40

Informative Data Table Life Expectancy: Highest 15 Countries, 2010 Disability Adjusted Life Expectancy 40 7/39 1. Data Presentation

A Dynamic Picture 8/39 1. Data Presentation

A Dynamic Picture 8/39 1. Data Presentation

Bar Charts vs. Data Tables 9/39 1. Data Presentation

Bar Charts vs. Data Tables 9/39 1. Data Presentation

Probability of Survival to Age 50, Female at Birth U. S. and 20 Other

Probability of Survival to Age 50, Female at Birth U. S. and 20 Other Wealthy Countries It is possible to be misled by a presentation such as this one. Note the vertical axis. What does this graph tell you? What do the probabilities mean? Are the differences meaningful? 10/39 1. Data Presentation

11/39 1. Data Presentation

11/39 1. Data Presentation

Does living longer make people happier? Or do people live longer because they are

Does living longer make people happier? Or do people live longer because they are happier? 12/39 1. Data Presentation

Does the Picture Tell the Story? This is the only graphic in the article.

Does the Picture Tell the Story? This is the only graphic in the article. The article compares default rates on VA vs. FHA mortgages. Is there anything wrong with this picture? The very technical looking graph/table is unrelated to the article. 13/39 New York Times, Page RE 1, July 24, 2014 1. Data Presentation

Data Presentation Agenda Data Types: Cross Section and Time Series p Summarizing Data Graphically

Data Presentation Agenda Data Types: Cross Section and Time Series p Summarizing Data Graphically p n n p Summarizing Data with Descriptive Statistics n n n 14/39 Pie chart, bar chart Box plot, histogram Central tendency Spread Distribution (shape) 1. Data Presentation

Data = A Set of Facts A picture of some aspect of the world

Data = A Set of Facts A picture of some aspect of the world Pizza Sales by Type What do the data tell you? How can you use the information? What additional information would make these data (more) informative? 15/39 1. Data Presentation

Data Types and Measurement p Quantitative n Discrete = count: Number of car accidents

Data Types and Measurement p Quantitative n Discrete = count: Number of car accidents by city by time n Continuous = quantitative measurement: Housing prices p Qualitative n Categorical: Shopping mall, car brand, trip mode n Ordinal: Survey data on attitudes; “How do you feel about…? ” Strongly disagree Disagree Neutral Agree Strongly agree Moody’s bond ratings: Aaa, A, Bbb, B, and so on. p Frameworks n Cross section n Time series 16/39 1. Data Presentation

Discrete, Count Data, Time Series 17/39 1. Data Presentation

Discrete, Count Data, Time Series 17/39 1. Data Presentation

Continuous Quantitative Data Housing Prices and Incomes 18/39 1. Data Presentation

Continuous Quantitative Data Housing Prices and Incomes 18/39 1. Data Presentation

Unordered Qualitative Data Travel Mode Between Sydney and Melbourne by 210 Travelers 19/39 1.

Unordered Qualitative Data Travel Mode Between Sydney and Melbourne by 210 Travelers 19/39 1. Data Presentation

Ordered Qualitative Data German Health Satisfaction Survey; 27, 326 individuals. On a scale from

Ordered Qualitative Data German Health Satisfaction Survey; 27, 326 individuals. On a scale from 0 to 10, how do you feel about your health? 20/39 1. Data Presentation

Aggregated Data May Be Easier to Understand (7 -8) (4 -6) (9 -10) (0

Aggregated Data May Be Easier to Understand (7 -8) (4 -6) (9 -10) (0 -3) Bad 21/39 Fair Good Excellent 1. Data Presentation

Ordered Qualitative Outcomes Bond Ratings Movie Ratings Arithmetic Mean may not be meaningful. (a)

Ordered Qualitative Outcomes Bond Ratings Movie Ratings Arithmetic Mean may not be meaningful. (a) Ordinal measure – rankings (b) Look at that distribution! 22/39 1. Data Presentation

A Problem with Ordered Survey Response Data 61 Stern Students’ Ranking of Subway Safety

A Problem with Ordered Survey Response Data 61 Stern Students’ Ranking of Subway Safety (1994)* Safety Count Percent Cum Pct 1 17 27. 87 Very Unsatisfactory 2 15 24. 59 52. 46 Unsatisfactory 3 17 27. 87 80. 33 OK 4 10 16. 39 96. 72 Satisfactory 5 2 3. 28 100. 00 Very Satisfactory There is no objective meaning to “ 3” on some standard scale. Does everyone’s “ 1” or “ 2” or “ 3” … mean the same thing? * Jeff Simonoff: Data Presentation and Summary, pp. 3 -4 23/39 1. Data Presentation

Cross Section Data Housing Prices and Incomes 24/39 1. Data Presentation

Cross Section Data Housing Prices and Incomes 24/39 1. Data Presentation

Time Series Data: Oil Price Graph is much more useful and informative than a

Time Series Data: Oil Price Graph is much more useful and informative than a table for time series data. 25/39 1. Data Presentation

Representing Data p In raw form p Transformed to a visual form p Summarized

Representing Data p In raw form p Transformed to a visual form p Summarized graphically p Summarized statistically 26/39 1. Data Presentation

Pie Chart vs. Frequency Table Pizza Pies Sold, by Type Same Information. Which is

Pie Chart vs. Frequency Table Pizza Pies Sold, by Type Same Information. Which is more useful for your audience? 27/39 1. Data Presentation

Data Representation: Bar Chart vs. Pie Chart BAR CHART PIE CHART Same data. Which

Data Representation: Bar Chart vs. Pie Chart BAR CHART PIE CHART Same data. Which is easier to understand? 28/39 1. Data Presentation

Table vs. Bar Chart (or both) 29/39 2013 data. Source: Bloomberg 1. Data Presentation

Table vs. Bar Chart (or both) 29/39 2013 data. Source: Bloomberg 1. Data Presentation

2013 Valuation of U. S. Sports Teams These figures reveal a league strategy. Football

2013 Valuation of U. S. Sports Teams These figures reveal a league strategy. Football Baseball 30/39 1. Data Presentation

A Box Plot Describes the Distribution of Values in a Set of Data Hawaii

A Box Plot Describes the Distribution of Values in a Set of Data Hawaii Box and Whisker Plot for House Price Listings 31/39 1. Data Presentation

Raw Data on Housing Prices and Incomes 32/39 1. Data Presentation

Raw Data on Housing Prices and Incomes 32/39 1. Data Presentation

Making a Box Plot for Per Capita Income Maximum=31136 3 rd Quartile = 24933

Making a Box Plot for Per Capita Income Maximum=31136 3 rd Quartile = 24933 Median =22610 Interquartile Range = IQR = 24933 -21677 = 3256 1 st Quartile = 21677 Minimum=17043 33/39 1. Data Presentation

Box and Whisker Plot = extreme observations What is an outlier? Why do we

Box and Whisker Plot = extreme observations What is an outlier? Why do we believe a particular point is an outlier? Outliers Smaller of (Maximum, Median + 1. 5 IQR Interquartile range=IQR 75 th Percentile Median 25 th Percentile Larger of (Minimum, Median – 1. 5 IQR 34/39 1. Data Presentation

Histogram for House Price Listings A histogram describes the sample data and suggests the

Histogram for House Price Listings A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings. 35/39 1. Data Presentation

Distribution of House Price Listings … shows up in the box and whisker plot.

Distribution of House Price Listings … shows up in the box and whisker plot. Note the long whisker at the top of the figure. Asymmetry (skewness) in the histogram of listing prices… 36/39 1. Data Presentation

House Price Listings and Per Capita Incomes. States. Regression and Correlation. Are these two

House Price Listings and Per Capita Incomes. States. Regression and Correlation. Are these two variables correlated? r =. 48 How to describe/summarize them. How to explain the variation across states How to determine if there is any correlation between the two variables. 37/39 1. Data Presentation

Big Data: Netflix Cinematch Rating/Recommendation System 38/39 1. Data Presentation

Big Data: Netflix Cinematch Rating/Recommendation System 38/39 1. Data Presentation

Summary p p What story does the data presentation tell? n Data in raw

Summary p p What story does the data presentation tell? n Data in raw form tell no story. n Visual representation of data tells something about the data n The representation of the data may reveal something about the underlying process that the data measure. What tool is most informative? n Reduction to a small number of features n Visual displays of data p Data Table – Organizing the data is often a good start. p Pie chart p Box and whisker plots p Bar charts p Histograms p Time series plots “There are lies, damned lies and statistics. ” (Benjamin Disraeli) 39/39 1. Data Presentation