MIS 2502 Data Analytics Descriptive and Inferential Statistics















- Slides: 15

MIS 2502: Data Analytics Descriptive and Inferential Statistics Jeremy Shafer Jeremy. Shafer@temple. edu http: //community. mis. temple. edu/jshafer

The Information Architecture of an Organization Getting ready to go here… Data entry Data extraction Transactional Database Stores real-time transactional data Data analysis Analytical Data Stores historical transactional and summary data

Types of statistics Descriptive • Describes the features of a full set of data Inferential • Seeks to draw an inference from a data sample • Used when a the full population can not be obtained

Descriptive Statistics Definition: Descriptive statistics is the discipline of quantitatively describing the main features of a collection of information. Descriptive statistics aim to summarize a sample. http: //en. wikipedia. org/wiki/Descriptive_statistics

“Need to know” information • Descriptive Concepts – Mean, Median, Mode – Skew – Outliers • Inferential Concepts – What’s a hypothesis? – How to interpret a p-value

Mean Definition: The mean (specifically the arithmetic mean) is also called the average and is the sum of a discrete number of values divided by the number of values.

Median Definition: The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one (e. g. , the median of {3, 3, 5, 9, 11} is 5). If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values. http: //en. wikipedia. org/wiki/Median

Mode Definition: The mode is the value that appears most often in a set of data. http: //en. wikipedia. org/wiki/Median

Skew When we compare the mean, median, and mode of a collection of data, we get a sense of how the data is skewed. http: //rchsbowman. wordpress. com/2010/08/30/statistics-notes-the-shape-of-distributions/

Outliers In statistics, an outlier is an observation point that is distant from other observations. One or more outliers can impact the mean. While mode and median remain unaffected. A simple illustration of this can be found here: http: //www. mathsisfun. com/data/outliers. html

Population Vs Sample Population Sample

Learning from a Sample ? Population Sample

Inferential Statistics – Hypothesis testing Statistical tests are set up to disprove the null hypothesis, and prove the alternate hypothesis. The null hypothesis (H 0), is the the assertion that two populations of data are the same. The alternate hypothesis (HA) is the assertion that two populations are different in some specific way. The null and alternate hypothesis statements can’t both be true. So we run tests to determine which one is true. There are different kinds of tests that can be performed, but generally the output of the test is the p-value.

p-values A p-value is basically a probability. A p-value will be some number between 0 and 1. It indicates the probability that any evidence we are seeing in the data to support the alternate hypothesis can be attributed to chance. Specifically, the p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, when the null hypothesis is, in fact, true So… the lower the p-value the more likely it is that the alternate hypothesis is true.

p-values Typically, a p-value of. 05 or less is (generally) considered sufficient to challenge the null hypothesis. A mnemonic device: if the p-value is low, then H 0 must go. A good summary of this can he found here: https: //www. youtube. com/watch? v=eykn. Gvnc. KLw