Data Mining Data Science Big Data Data Science
Data Mining, Data Science, Big Data
Data Science § Data Science aims to extract insights from large data § Less emphasis on algorithms § More emphasis on ‘outreach’ § Term Data Science is about 10 years old, very popular nowadays § Many people reinvent themselves as Data Scientists § data miners, statisticians, BI people, analysts, database developers
Data Mining & Data Science Data Mining fff Statistics § Computational methods § Dealing with large data § Visualisation § Involving domain knowledge § Interpretable and interpreted results
Big Data § Because you can… § cheap storage § Administrative/financial reasons § Internet and social computing § Internet of Things, ubiquitous computing cost per Gigabyte in dollars $1, 000 $10, 000 $1 $0. 01 1980 1990 2000 2010
Cheap Storage 1956, IBM 350, 5 Mb 90 Tb
Big Data Many facets, often people focus on only one §Very, very large data § CERN, Google, Facebook, Twitter, … §Analytics §Internet-generated §Social data §Heterogeneous, unstructured data §Large-scale technologies § Map. Reduce, Hadoop
Size-complexity trade-off § Technological restrictions produce a trade-off § Many Big Data projects algorithmically not so complex § Embarrassingly parallel size CERN complexity
- Slides: 7