COMS 4407 A Week 11 End of Science

  • Slides: 25
Download presentation
COMS 4407 A Week 11: End of Science? Critical DATA Studies Class Schedule: Thursdays,

COMS 4407 A Week 11: End of Science? Critical DATA Studies Class Schedule: Thursdays, 14: 35 - 17: 25 Location: River Building 3224 Instructor: Dr. Tracey P. Lauriault E-mail: Tracey. Lauriault@Carleton. ca include COMS 4407 A in the subject line Office: 4110 b River Building Office Hours: Mondays 2: 30 to 5: 30, Thursdays 9: 30 -11: 30. Bookmarks: http: //del. icio. us/tlauriau Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Week 11 - Agenda • Review • Data Science, Analytics & Predictive Policing •

Week 11 - Agenda • Review • Data Science, Analytics & Predictive Policing • Science Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Readings • Chapter 8. The Data Revolution. (21 pages) • Anderson, Chris (2008) The

Readings • Chapter 8. The Data Revolution. (21 pages) • Anderson, Chris (2008) The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired Magazine June 23 rd. (2 pages) Available at: http: //www. wired. com/2008/06/pb-theory/ • Davenport, Thomas H. and Patil, D. J. (2012) Data Scientist: The Sexiest Job of the 21 st Century, Harvard Business Review (4 pages). https: //hbr. org/2012/10/data-scientist-the-sexiest-job-of-the-21 st-century • Batty, M. (2016) Theoretical filters: Reducing explanations in cities to their very essence; Environment and Planning B: Planning and Design, 2016, Vol. 43(5) 797– 799 DOI: 10. 1177/0265813516663994 (3 pages). Dr Tracey P. Lauriault, COMS 4407 A 2016, Carleton University Tracey P. Lauriault, School of Journalism https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407 and Communication, Carleton University

End of Science “There's no reason to cling to our old ways. It's time

End of Science “There's no reason to cling to our old ways. It's time to ask: What can science learn from Google? ” “With enough data the numbers speak for themselves”! Is hypothesize, model, text – becoming obsolete? Does correlation supersede causation? Do we no longer need explanation? http: //archive. wired. com/science/discoveries/ magazine/16 -07/pb_theory Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University http: //blogs. gartner. com/doug-laney/files/2012/01/ad 9493 D-Data-Management-Controlling-Data-Volume-Velocity -and-Variety. pdf (2001) https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Volume http: //canada. emc. com/collateral/analyst-reports/ar-the-economist-dataeverywhere. pdf “at the petabyte scale. . . it forces

Volume http: //canada. emc. com/collateral/analyst-reports/ar-the-economist-dataeverywhere. pdf “at the petabyte scale. . . it forces us to view data mathematically first and establish context later” Anderson, 20007) Correlation is enough. We need not look for models need to just read the data! The algorithms find the patterns & generate theories and hypotheses follow the data? Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Big Data & Scientific Revolution “revolutions in science are often preceded by revolutions in

Big Data & Scientific Revolution “revolutions in science are often preceded by revolutions in measurement” Is big data, with tools, infrastructures, and techniques, is that a revolution in measurement? Will it fundamentally alter science? Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Paradigm shift? • Paradigm (Kuhn 1962) • Normalized and accepted way to interrogate the

Paradigm shift? • Paradigm (Kuhn 1962) • Normalized and accepted way to interrogate the world & synthesizing knowledge • Researchers share a common philosophy, use common methods • Ask questions and build knowledge incrementally • Favour similar ontologies, epistemologies, theories, methods, and ethical knowledge frames • Hegemonic way • Ex. Darwin • Ex. Hypothesis testing Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Reframing Science Paradig m Nature Form When 1 st Experimenta Empiricism, describing Prel Science

Reframing Science Paradig m Nature Form When 1 st Experimenta Empiricism, describing Prel Science natural phenomena renaissance 2 nd Modelling and generalization, Newton 3 rd 4 th Theoretical Science Chapter 8, The Data Revolution Pre. Computers Computation Simulation of complex Pre-Big Data al Science phenomena Data-intensive, Exploratory Now statistical exploration Science and data mining Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

The re-emergence of empiricism? • Data-ism W/data-ist claims • Filter out emotionalism • Fortune

The re-emergence of empiricism? • Data-ism W/data-ist claims • Filter out emotionalism • Fortune tellers • Empiricist framing • Anderson, 2007 • Only need to mine large swaths of data • Discover associations not hypothesis testing • Recommendation systems • Market basket analysis • Predict rather than understand Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Source of the thinking • Examples are normally from business and marketing • Ex.

Source of the thinking • Examples are normally from business and marketing • Ex. Amazon recommendations system or Prizm 5 • Less a focus on why and more a focus on that it exists • Prediction trumps explanation “At Ayasdi, we believe that deriving insight from big data will become essential and transformative for every enterprise. We believe that machine intelligence has the ability to revolutionize industries, allowing businesses to scale with algorithms and raw computing power rather than people, drastically improving human productivity, increasing operational effectiveness, and Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407 advancing scientific discovery”.

Fallacies of empiricism 1. Big data can capture a whole of a domain and

Fallacies of empiricism 1. Big data can capture a whole of a domain and provide full resolution • Data are but a representation • What you see is framed by what you are able to see 2. No need for a-priori theory, models or hypothesis • What is amazon’s reasoning for its recommendation system? • Domain specific knowledge and pattern recognition science 3. Data can speak for themselves free of human bias or framing • Data determinism • Are data neutral? Unbiased? Just facts? Is correlation causation? 4. Meaning transcends context or domain-specific knowledge • Is there really no need for domain knowledge? • Can you model a city in the absence of history? Social science? Policy? Governance? Law? Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Reasoning • Abductive • • From observation to theory Big data analytics Best explanation

Reasoning • Abductive • • From observation to theory Big data analytics Best explanation but not the final conclusion Direction is given based on a-priori knowledge • Deductive • Validity and soundness • If premises are true then the conclusion is true • Certainty • Inductive • Insight emerging from data that is contextually framed • Premises have strong evidence that they are true • Probability – based on evidence given Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Data driven science • Hold to the tenets of the scientific method, but is

Data driven science • Hold to the tenets of the scientific method, but is more open to using a hybrid combination of abductive, inductive and deductive approaches to advance the understanding of a phenomena • Differs from traditional experimental deductive design, the hypothesis instead are born from the data in lieu of born from theory • Guided knowledge discovery • Inductive reasoning • New way to build theory • Better at exploring and extracting value and making sense of massive sets of data Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Possibilities • Molecular biology • Modelling • Ecosystems • Digital Humanities • Computational Social

Possibilities • Molecular biology • Modelling • Ecosystems • Digital Humanities • Computational Social Science • Etc. Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University • Social determinism • Reductionism • Surficial reading • Big data analytics: • • • Struggles with the social Struggles with the context More spurious correlations Difficult to address bigger problems Favours memes over masterpieces Obscures values https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Final Note “understanding human behavior and societies cannot and should not be reduced to

Final Note “understanding human behavior and societies cannot and should not be reduced to wrote, methodical and mechanistic analysis. . [it should include] sustained thinking about what kinds of techniques should be applied to what kinds of data, in what circumstances, to answer specific questions, rather than data being run through a statistical sausage factory that produces low-grade, ground up meat rather than choice cuts” p. 145 Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Mike Batty • https: //vimeo. com/119354430 • “to extract specific meanings from any phenomenon

Mike Batty • https: //vimeo. com/119354430 • “to extract specific meanings from any phenomenon in the quest to reduce our explanations to their bare essence” • “theories abstract from an agreed reality throwing away that which appears irrelevant” • “Developing, testing and then ultimately using good theory in any context thus depends on working with ideas that are not obstructed by anything that is not central to the purpose in hand” • “in developing good theory, much of the reality we perceive must always be neglected in our search for the essential logic for which theory is being developed”. • “Our theories, are never perfect representations of an actual reality when used to make predictions, take the simulated reality and assume that the future works the way theory predicts” • “view. But theory is more than this in that in its construction, there must be a conscious filtering of the data, removing that which is likely to interfere with the behaviours and processes theory is intent on explaining” Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Data Science- Sexiest Job ever! Person 1 Person 2 Person 3 Linked. In Triangle

Data Science- Sexiest Job ever! Person 1 Person 2 Person 3 Linked. In Triangle Closing People you may know Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University http: //www. forbes. com/pictures/efej 45 gkjj/2 -jeff-hammerbacher-chief-scientist -cloudera-and-dj-patil-entrepreneur-in-residence-greylockventures/#607 ced 085 f 53 https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Enablers http: //hadoop. apache. org/ • Software/Framework • “The Apache Hadoop software library is

Enablers http: //hadoop. apache. org/ • Software/Framework • “The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures”. • Open source statistical packages • Cloud computing • Skill set • Mind set Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

What do data scientists do? • Make discoveries while swimming in data • Bring

What do data scientists do? • Make discoveries while swimming in data • Bring structure to large quantities of formless data to make analysis possible • Identify rich data sources & join them with others • Shift from ad hoc analysis to an ongoing conversation with data • Do not get bogged down in technical limitations • Creative display of information Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University • Hybrid data hacker, analyst, communicator and trusted advisor • Storytelling with data • Turn unstructured data into structured data • Training in computer science, science , statistics, probability, and math • They want to be on the ‘bridge’’ and be engaged and involved https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

How to Find the Data Scientists You Need? 1. Focus recruiting at the “usual

How to Find the Data Scientists You Need? 1. Focus recruiting at the “usual suspect” universities 2. Scan membership rolls of user groups devoted to data science tools. The R User Groups (for an opensource statistical tool favored by data scientists) and Python Interest Groups (for PIGgies) 3. Search on Linked. In—they’re almost all on there, and you can see if they have the skills you want. 4. Hang out with data scientists at the Strata, Structure: Data, and Hadoop World conferences and similar gatherings or at informal data scientist “meetups” 5. Make friends with a local venture capitalist, who is likely to have gotten a variety of big data proposals over the past year. Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University 6. Host a competition on Kaggle or Top. Coder, the analytics and coding competition sites & Follow up 7. Coding - skills don’t have to be at a world-class level but should be good enough to get by. Can candidates learn rapidly about new technologies and methods. 8. Can find a story in a data set and provide a coherent narrative about a key data insight. Can he or she communicate w/ numbers, visually & verbally. 9. Need to be connected to the business world. 10. Who are their favorite analysist or insight and how they are keeping their skills sharp. Stanford’s online Machine Learning course? Contributed to open-source projects, or built an online repository of code to share (for example, on Git. Hub)? https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Data-driven geography Miller & Goodchild, 2014, 4 paradigms in science 1. Empirical science •

Data-driven geography Miller & Goodchild, 2014, 4 paradigms in science 1. Empirical science • describing natural phenomena 2. Theoretical science • models and generalization 3. Computational science • simulating complex systems 4. Data driven science • interrogating the world via large scale complex instruments and databases Tensions 1. Theory driven vs data driven 2. Prediction vs discovery 3. Law seeking vs description seeking 4. Evolution vs revolution 5. From question to sample – from sample to question Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University Issues: 1. Population not samples 2. Messy not clean 3. Correlation not causation Capabilities of abductive reasoning 1. Ability to posit fragments of theory 2. Massive set of knowledge, common sense to domain expertise 3. Means to search to find connections and patterns and potential explanation 4. Complex problem solving – analogy, approximation and guessing 5. Background kn and interesting measures, formalized kn https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Data Driven Geography • Big questions • Are theory and explanation archaic? • Does

Data Driven Geography • Big questions • Are theory and explanation archaic? • Does data velocity matter? • Can lack of QC & rigorous sampling be overcome? • Can we make valid generalizations from serendipitous data collection? • Can big data-driven methods lead to significant discoveries? • Or will we continue to rely on scarce data (small data)? Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University 1. Theory in data driven geo • Correlation supersedes causation, explanation but not laws. mid range theories, general propositions, long terms big space vs short term small space, nomotheic vs idiographic 2. Approaches to data driven geo • Knowledge discovery, data exploration and hypothesis generating, abductive, deductive and inductive reasoning • Data-driven modelling – general to specific vs specific to general, predictive performance • Theory may not be possible, data drive the form of the model, complexity, de-skilling 3. Caution with data driven • Formalizing geo kn, spuriousness, truth and understanding, black boxed algorithms, privacy, pre-crime, pre-punishment, data-driven dictatorship Benefits • Spatial temporal dynamics vs snapshots @ multiple scales • Mundane & unplanned phenomena captured • Probable and inconsequential • Improbable but consequential https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Conclusion • Most fundamental changes are variety and velocity in data • Old issues

Conclusion • Most fundamental changes are variety and velocity in data • Old issues in new clothes – volume, n, messy data, idiographic vs nomothetic kn • Big data can inform both geographic kn discovery and spatial modelling – but need to formalize geog kn to clean data and ignore spurious patterns, and to build true and understandable models • Blackbox of closed systems • Caution on social implications – predictive governance, avoid data dictatorships and humans need to be part of the decision making process Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Wikipedia Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10.

Wikipedia Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407

Definitions • Data Science • https: //en. wikipedia. org/wiki/Data_science • Data Analysis • https:

Definitions • Data Science • https: //en. wikipedia. org/wiki/Data_science • Data Analysis • https: //en. wikipedia. org/wiki/Data_analysis • Statistics • https: //en. wikipedia. org/wiki/Statistics • COMS 4407 • https: //dashboard. wikiedu. org/courses/Journalism_and_Communication, _Carleto n_University/COMS 4407_Critical_Data_Studies_(Fall_2016) Dr. Tracey P. Lauriault, COMS 4407 A 2016, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2016. coms 4407