Chapter 4 Correlation 22015 Correlation Statistical relationship between
Chapter 4: Correlation 2/2015
Correlation − Statistical relationship between two data values − Strong correlation: if A changes, B is likely to change − Weak correlation 2/2015
Case: Walmart − Tracked every product and analyzed every transaction − Before hurricanes, sales of flashlights and Pop-Tarts increased => If storm was approaching Walmart boosted stocks for these products 2/2015
Correlation analysis: How to find suitable proxies? − In small-data world − Hypothesis-driven approach − Proxies chosen first and then tested − Slow and repetitive − Subject to false intuition − Hard to find non-linear relationships because small sample size 2/2015
Correlation analysis: How to find suitable proxies? − In big-data world − Data-driven approach (n = all) − Optimal proxies can be found by computer-driven analysis (e. g. Google Flu Trends and search terms) − Hypothesis not needed − Possible to find complex non-linear relationships − Predictive analysis − use data to predict events before they happen − Case: Health care − very constant vital signs before infection and not other way around 2/2015
What, not why − Data-driven analysis aims to to find non-causal links (what) − Possibility to use mathematic and statistical methods (not possible with causality) − Can be used to further investigate if there is causality between links 2/2015
The end of theory? − We can just look at the data and not be limited by theories and hypothesis 2/2015
- Slides: 7