COMM 3409 A Fall 2015 BIG DATA AND
COMM 3409 A Fall 2015 BIG DATA AND SOCIETY Week 2 (Sept. 10) – Big Data C l a s s S c h e d u l e : T h u r s d a y s , 1 8 : 0 0 - 2 1 : 0 0 Dr Tracey P. Lauriault L o c a t i o n : A z r i e l i P a v i l i o n ( A P ) 1 3 2 Communication Studies Instructor: Dr. Tracey P. Lauriault E - m a i l : T r a c e y. L a u r i a u l t @ C a r l e t o n. c a ( P l e a s e u s e School of Journalism and c u L e a r n ) Communication O f f i c e : 4 1 1 0 R i v e r B u i l d i n g Tracey. Lauriault@carleton. O f f i c e H o u r s : T u e s d a y a n d T h u r s d a y 3 : 3 0 - 5 : 3 0 , o r ca b y a p p o i n t m e n t @Tracey. Lauriault W O O D Q U ADr. Y Tracey V E N UP. ELauriault, , D U B COMM 3409 A L I N , 2 4 A 2015, P R ICarleton L 2 0 1 University 5 https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 BIG DATA § Questions about assignment, course outline? Kitchin, The Data Revolution § Enablers of Big Data § Review of Week 1 - Counting § Types of Big Data § Conceptualizing Data § Discussion about Week 1 papers - Desrosières, Porter, The Anxiety of Big Data & The Shelton, Zook and Graham Secret Life of Big Data and any of the resources § Crawford & Bell Papers Big Resources: § Massachusetts § Ireland § Corporate Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
QUESTIONS Assignment 1 § What is a dataset? Due Monday 14 September at Noon Course Outline § Assignments? Look for a dataset that interests you and attempt to § download the data. In no more than one page, explain the download § process (the steps you took and whether you had difficulty downloading them), where you found these data (e. g. a portal, news blog, data library, etc. ), and describe the dataset in such a way that 10 years from now you or a stranger could decipher what they are and what they could be used for. Explain your interest in this dataset, what you might use the data for and explain what led you to trust these data. Be sure to provide a full citation of the dataset in the data citation format of your choice. Here are some useful guides (http: //ukdataservice. ac. uk/usedata/citing-data and https: //www. datacite. org/services/format-yourcitation. html). Proposal? Paper? Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 1 - COUNTING Bureaucratic action Counting Marriage Referendum Ireland Census form Qs Smoking Ban Quantifying Toronto Neighbourhood Normal Curve – Study and Diabetes Quetelet Resistance Normalizing Facebook dropdown menu Air Q Sensors & Asthma Association Measures Seventeen Article Correlating Plus Size Models BMI Calculator Making things visible Classifying Homosexuality, Air Q, Obesity Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 1 – PAPERS & RESOURCES Resources Readings Desrosières, Alain (1998) Introduction: Arguing from Social Facts in The Politics of Large Numbers: A History of Statistical Reasoning. Cambridge: Harvard University Press. pp. 1 -16. Porter, Theodore M. (1995) Cultures of Objectivity Introduction in Trust in Numbers: The Pursuit of Objectivity in Science and Public Life, Princeton University Press. pp. 3 -11. Bureau of Investigative Journalism (BIJ) Get the data: Drone wars - Get the data: What the drones strike Global Terrorism Database (GTD): http: //www. start. umd. edu/gtd/ab out/ Global Database of Events, Language, and Tone (GDELT): A Floating Sheep collaborative Paper: The Technology of Religion: Mapping Religious Cyberscapes Quilliam Foundation Report: Jihad Trending: A Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University Comprehensive Analysis of https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 1 - DESROSIÈRES, ALAIN (1998) INTRODUCTION: ARGUING FROM SOCIAL FACTS Measurement vs Object How are facts produced? measured How do measures How is the measurement stabilize/encoded? made? How do these measures How reliable / valid is that work in the world? measure? What are the attributes of What kind of model is the objects measured? used? What are the parameters of What is the definition? how things are measured? Are things measured What is the difference real? between a state statistician and statistics? Or is it the outcome of work? –debates, issues etc. What are administrative statistics? How do these measurement become Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University normal and believed? https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 1 - TERMINOLOGY DESROSIÈRES P. 3 nominalist, skeptical, relativist, instrumentalist, constructivist P. 4 description & decision/prescriptive, objective & subjective, frequentism & epistemism, realism & nominalism, errors in measurement & natural dispersion, internalist & externalist P. 6 equivalences P. 8 a priori & a posteriori probability Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 1 - PORTER, THEODORE M. (1995) CULTURES OF OBJECTIVITY Social basis of authority Who is telling the truth? Who are the purveyors of truth? Who is paid to tell the truth? What is the social basis of authority? Who are the experts? What are the rules of measuring? Ideology of quantitative expertise or an objective way to tell the truth? Are democracy and objectivity related? Absolute objectivity Mechanical objectivity Disciplinary objectivity Instrumentists Evidence based decision making Domain knowledge vs expert knowledge vs experiential knowledge Scientists vs professionals vs public officials Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 1 - TERMINOLOGY Absolute objectivity Definitive impartial truths derived from science Mechanical objectivity Trust in measuring instruments, rules and procedures Often used to justify actions and absolve oneself from moral responsibility Disciplinary objectivity Evidence-based decision making The use of data, information and scientific knowledge to make empirically supported decisions Domain knowledge vs expert knowledge vs experiential knowledge VGI, Citizen Science, Crowdsourcing Scientists vs professionals vs Consensus among those in a public officials discipline Trust in numbers Trained judgement, professionals Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 – PAPERS & RESOURCES Resources Readings Kitchin, Rob. (2014), Chapter 1, Conceptualizing Data pp. 1 -26 and Chapter 4, Big Data, pp. 67 -79, The Data Revolution. London: Sage Crawford, K. , 2014, The Anxieties of Big Data, The New Inquiry, April. Bell, Jennifer (2015) The Secret Life of Big Data, in Boellstorff, Tom and Maurer, Bill, (Eds. ) Data, Now Bigger and Better! Chicago: Prickly Paradigm Press, pp. 7 -26. Kitchin, Rob. (2014), Chapter 5 Enablers and Sources of Big Data, pp. 80 -89. The Data Revolution. London: Sage. The MASSTech Big Data Report The MASS TLC Big Data Report Government of Ireland: Assessing the Demand for Big Data and Analytics Skills, 2013 – 2020 Report Mc. Kinsey Global Institute: Big data: The next frontier for innovation, competition, and Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University productivity https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 – BIG DATA - ENABLERS AND SOURCES OF BIG DATA “big data has arisen due to the simultaneous development of a number of enabling technologies, infrastructures, techniques and processes, and their rapid embedding in everyday business and social practices and spaces” “the places in which we live are now augmented, monitored and regulated by dense assemblages of data-enabled infrastructures and technologies” Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
5 ENABLERS OF BIG DATA 1. Computational Power 2. Networking 3. Pervasive and ubiquitous computing 4. Indexical and machine-readable identification 5. Data Storage Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
1. COMPUTATIONAL POWER ENIAC Mainframe, 1946 Image Source: http: //andrewchen. co/photos-of-thewomen-who-programmed-the-eniac-wrote-the-codefor-apollo-11 -and-designed-the-mac/ Tianhe-2 supercomputer, China’s National University of Defense Technology, 33. 86 petaflop/s (quadrillions of calculations per second) on the Linpack benchmark. (TOP 500 List) Image Source: http: //www. theplatform. net/2015/07/13/the-top-500 supercomputer-list-is-in-chinas-reign-continues/ Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
2. NETWORKING ARPANET 1969 ARPANET 1977 Images Source: page of the WWW by Tim Berners Lee, 1992 Image Source: http: //classes. design. ucla. edu/Spring 06/161 A/projects/camile/arpanet/ http: //www. theatlantic. com/technology/archive/2013/05/world-we-have-lost-the-first. Wireless mesh network Image Source: webpage-professor-oh-i-have-a-copy-of-it-right-here/276387/ Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //commons. wikimedia. org/wiki/File: Wireless_mesh_network_diagra 1 st https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
3. PERVASIVE & UBIQUITOUS COMPUTING Image Source: http: //theconversation. com/intelligentinfrastructure-when-roads-and-vehicles-talk-to-each-other -7865 One person/thing interacting with many computers/devices seamlessly, and technologies have been made smart/technologically enhanced to interact with each other devices/technologies to share information, pervasive is computation in everything and ubiquitous is computing in every place. Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
4. INDEXICAL AND MACHINE READABLE IDENTIFICATION The labelling of things and people with unique identifiers in order to identify, link and interconnect things & human attributes or characteristics. These UIDs provide the means to do fine grain analysis. Devices have unique ID code for tracking, and these devices also produce data that can be attributed to it. Characteristics: a) Universal in coverage b) Uniqueness c) Permanence d) Indispensable Examples: People: SIN #, passports #, health cards, biometrics, DNA, usernames, passwords, chip & pin, swipe cards Places: Lat & Long, grid reference, postal code, Things: mac addresses, bar codes, RFID chips Fingerprint Image Dr. Source: Tracey P. http: //www. zdnet. com/article/synaptics-acquires-validity. Lauriault, COMM 3409 A 2015, Carleton University for-255 m-dives-into-biometrics/ https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
5. DATA STORAGE Analog to utility and data cloud computing! Data Cloud Image Source: http: //poweritpro. com/news-amp-views/ibm-adds-new-storage-systems-and-smartcloud -storage Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
3 SOURCES OF BIG DATA 1. Directed Data 2. Automated Data 3. Volunteered Data Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
1. DIRECTED DATA Organized and structured surveillance of others either through a person or a technological device complemented with UID, fingerprints and other biometrics and augmented with algorithms and aided by software Extends governance and surveillance regimes Ex – CCTV, tax forms, police operation room, border service, Other types LIDAR, Drones, air photos Image Source: http: //imagemakersmag. ca/20 Year. Markfor http: //www. ottawalife. com/2014/06/drones COMPASSHighway. Systems. aspx 2015, Carleton University -non-military-uses/ Dr. Tracey P. Lauriault, COMM 3409 A https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
2. AUTOMATED DATA Data are generated by a variety of digital technologies. These can be automatically and autonomously processed analyzed by algorithms a) Automated Surveillance: Smart metering, automatic number plate recognition, intelligent transportation systems, automated surveillance – smart cards – Presto, RFID Chips on waste bins b) Digital Devices: that produce digital data as a primary function or as exhaust. Ex. Cameras, medical equipment, gps units or mobile phones or satellite or cable receivers, smartphones and logjets c) Sensed Data: sensors embedded o or placed within structures that measure output, light, temp, wind, waves which produce continuous streams of data. (Io. T, UBIComp) d) Scan Data: Machine readable ID codes, barcodes, magnetic strips, chips, e) Interaction Data: ICT, ISP tracking, cookies and clickstreams, phone calls and email headers Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
3. VOLUNTEERED DATA Data generated by individuals and/or input by individuals in exchange for a service, or data contributed as part of a collective project a) Transactions: online purchasing, online government forms, loyalty card, opinions and review services, b) Social Media: FB, Twitter, Flickr, Instagram, You Tube, blogging, Mashups, 4 Square, c) Sousveillance: self-monitoring and management of one’s life information via intimate digital technologies, lifelogging/quantified self, data about consumption, physical state, emotional state, performance, d) Crowdsourcing: collective generation of media, ideas and data provided voluntarily to resolve a particular task. Collective production of information. 3 types – produce a solution, evaluate, single solution from the crowd e) Citizen Science: citizens work with scientists to provide observations, some forms are distributed computing, digitization, measuring, data collection, analysis and research design Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 – BIG DATA http: //www. nytimes. com/20 12/02/12/sundayreview/big-datas-impact-inthe-world. html http: //archive. wired. com/sc ience/discov eries/magazi ne/1607/pb_theory / http: //www. ft. com/indepth/big-data http: //www. economist. com/node/15579717 http: //www. sciencemag. org/site/special/data/ Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
GARTNER HYPE CYCLE Image Source: http: //www. forbes. com/sites/gilpress/2014/08/18/its-official-the -internet-of-things-takes-over-big-data-as-the-most-hyped-technology/ 2014 Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
7 ATTRIBUTES OF BIG DATA 1. Volume 2. Exhaustivity 3. Resolution & identification 4. Relationality 5. Velocity 6. Variety 7. Flexible & Scalable (Kitchin, 2014) Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
1. VOLUME http: //canada. emc. com/collateral/analyst-reports/ar-theeconomist-data-everywhere. pdf Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
SOME BIG DATA VOLUME STATS In 2013 +/- 114 Billion emails and 24 billion text messages and 12 billion phone calls globally In 2012 +/- 3 billion Google Search queries, +/- 24 petabytes per day In 2012 +/- 400 million tweets per day with associated metadata In 2012 +/- Walmart had 1 million customer transactions per hour, 2. 5 petabytes of data In 2011 +/- Facebook had 2. 5 billion pieces of content, 9. 3 billion hours per month on the site, 2. 7 like actions and 300 million photo uploads per day A human genome sequence is +/- 100 gigabytes of data Large Hadron Collider 40 terabytes a second Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
2. EXHAUSTIVITY Sampling is a technique used to collect a representative set of data from a total population of all potential data - n=all Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
3. RESOLUTION & INDEXICALITY Big data are becoming more fine grained, individual data, point data, fine resolution pixelated data. Google Earth 2. 5 Metre Resolution Landsat Satellite Image 30 m resolution Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
RM-HALTON GEOGRAPHY Source: CSDS Consortium Member - Regional Municipality of Halton Framework Data Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
RM-HALTON GEOGRAPHY Source: CSDS Consortium Member – Community Development Halton Framework Data Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
INDEXICALITY Fingerprint Image Source: http: //www. zdnet. com/article/synaptics-acquires-validityfor-255 m-dives-into-biometrics/ Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
4. RELATIONALITY The ability to co-join datasets. Censuses are structured datasets which are relational across time, geography and questions http: //swampla nd. time. com/2 012/11/07/insi de-the-secretworld-ofquants-anddata-crunchers http: //http-server. carleton. ca/~fbrouard/T 3010 group -who-helpedobama-win/ Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
5. VELOCITY Big data are dynamic, a census is time stamped whereas realtime data is continuously streaming, Facebook is 24/7. Observations are continuously made overtime and systems are always on. Image Source: http: //ceras is. com/201 4/04/30/ecommercelogistics/ Image Source: http: //modmedsys. com/medical-equipmentrepair/ Image Source: http: //www. vespermarine. com/xb 8000 -ais. Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University transponder. html/ https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
6. VERACITY Partial List of Database formats: https: //en. wikipedia. org/wiki/List_of_file_formats 123456 ABCDE Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
7. FLEXIBILITY Most research once conducted is quite inflexible, forms do not change often, research design remain the same for comparability reasons. Big data are: Extensionable – can add new fields Scalable – can expand readily Tweet example during the superbowl Flexible – Obama campaign example Hardware, distributed computing, storage and algorithms allow for continuous searching and matching. Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
BIG DATA CHARACTERISTICS Attributes Small Data Big Data Volume Limited to large Very large Exhaustivity Samples N=100 Resolution & Indexicality Course & weak, tight & strong Tight & strong Relationality Weak to strong Strong Velocity Slow, freeze-framed/bundled Fast & continuous Variety Limited to wide Wide Flexible & Scalable Low to middling High Table Source: http: //eprints. maynoothuniversity. ie/5684/1/Kitchin. Lauriault_Smalland. Big. Data_Programmable City-Working. Paper 1_SSRN-id 2376148. pdf Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 – CONCEPTUALIZING DATA “data are commonly understood to be the raw material produced by abstracting the world into categories, measures and other representational forms – numbers, characters, symbols, images, sounds, electromagnetic waves, bits – that constitute the building blocks from which information and knowledge are created” (Kitchin, 2014: 1) “data do not exist independently of the ideas, instruments, practices, contexts and knowledges used to generate, process and analyze them” (Kitchin, 2014: 2) Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
DATA QUALITIES Discrete Each datum is individual, separate and separable, clearly defined Aggregative Can be combined and built into sets (i. e. nominal, derived) Discoverable Described and cataloged with metadata Linked Joined and or linked to other datasets Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WHAT ARE DATA? Capta – data are taken or extracted from observation, computation, experiments and record keeping (Dodge and Kitchin, 2011) Data are partial, selective and representative Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
KINDS OF DATA 1. Quantitative 9. Derived 2. Qualitative 10. Primary 3. Structured 11. Secondary 4. Semi – Structured 12. Tertiary 5. Unstructured 13. Indexical 6. Captured 14. Attribute 7. Exhaust 15. Metadata 8. Transient Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
KINDS OF DATA 1. Quantitative Numeric records Relate to physical properties or phenomena (height, weight, distance) Representative and relate to characteristics (social class) 2. Qualitative Non numeric data Text, pictures, art, video, Discourse analysis, humanities research on original texts etc. Jihad Trending Twitter Data Nominal: categories – male, female Ordinal: rank order – low, med, high Interval: measured along a scale - temp Ratio: liked. Dr. but on a true scale Tracey P. Lauriault, COMM 3409 A 2015, Carleton University and starts with 0 - exam marks https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
DATA STRUCTURE 3. Structured Data Can be organized and structured, stored and transferred, organized into a data model in a relational database. They can be processed, searched, queried, and can be transformed into graphs, and processed through algorithms – name, address, age 5. Unstructured Data Non relational data model Not all the data in a dataset share the same structure Difficult to combine Can be classified and often are qualitative Facebook posts, Twitter feeds, No. SQL databases can help sort these for querying, classifying 4. Semi-Structured Data Loosely structured data with no pre-defined data model and cannot be held in a relational dbase Can be hierarchically nested, some fields - xml-tagged web pages Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
KINDS OF DATA 6. Captured Data (raw) Captured through some sort of measurement - filled out forms, field experiments, cameras, sensors, etc. Usually deliberate intent to measure Software enabled systems 8. Transient Data: data that are generated so fast that they are not stored nor analyzed – health care data such as in surgery videos Can be exhaust data 9. Derived Data that have been 7. Exhaust Data (raw) processed or analysis, Inherently produced by a normally raw captured and device or a system but these exhaust data – number of data are a byproduct of the cars per hour crossing process – checkout till, UBER intersection taxi ride metadata, Captured data can be their imput Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University Combined – plate, vehicle https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
KINDS OF DATA 10. Primary Data Generated by a researcher and their instruments according to a research model 11. Secondary Data Primary data shared with others for re-use 12. Tertiary Data Derived data, counts, categories, nominal data Often research by public administrations about programs Can be indicators Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
KINDS OF DATA 13. Indexical Data Those that allow unique identification and enable linking or joining - SIN Allow for the joining with non-indexical data via shared identifiers 14. Attribute Data Represent aspects of a phenomenon but not indexical – Cities w/ population, density, areal extent, number of schools would be the attributes of the city 15. Metadata Data about the data Like a library catalog entry except for data Descriptive metadata Structural metadata Administrative metadata Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
KNOWLEDGE PYRAMID Knowledge pyramid (adapted from Adler 1986 and Mc. Candless 2010) in Kitchin 2014: 10 Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
PROPERTIES OF DATA Properties of data Value Non-Rivalrous Occurrence Many can possess the same data/information Non-Excludable Data are easily shared, difficult to limit sharing – IP, Copyright Zero marginal cost Cost of reproduction is negligible – although infrastructure, soft and hard are never free Time and ingenuity to discover, design, author Transmission Network, distribution, access, retrieve, transmit Processing & management Collect, validate, modify, organize, index, classify, filter, update, sort, store Usage: Monitor, model, analyze, explain, plan, forecast, decision-making, instructing, educating and learning Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
CRITICAL THINKING Framing Databases & Infrastructure 1. Technically Data assemblage 2. Ethically Social shaping qualities 3. Politically & economically 4. Philosophically Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 – ANXIETIES OF BIG DATA GCHQ, Human Science Operations Cell & NSA are trying to make sense of the open pit mines of data they collect Surveillant Anxiety – of the surveilled and the surveyors There will just never be enough data More data = more truth? Political and cultural turn of big data K-hole – normcore, dorkware, dressing like a tourist, disappearing/blending in is the new cool Civilian workshops “fitting in is the ultimate camouflage” Anonymity as a form of privilege - electronic benefit transfer (Crawford, 2014) Normcore vs encryption Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 – THE SECRET LIFE OF BIG DATA Socio-technological imagination 1. Data have always been big Domesday Book, 1085 William the Conquerer BBC Domesday 1986 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Will everything produce data? Will everyone produce data? What do data want? Data keep it real? Data love good relationships Data don’t always have the best network Data have a country Data are feral Data have responsibilities Data keep it messy but like to look good Data don’t last forever How to read algorithms? Studying the new priests & alchemists Critiquing the new empiricism & countering it with our own Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 RESOURCES Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
WEEK 2 – ASSIGNMENT 2 Referring to the characteristics of big data provided by Kitchin, conduct an inventory of the big data you produce and generate a list. Select 3 to 5 from this list and describe their characteristics, explain how you generate these, what devices you use to do so, who owns them, how you access them, how these get used and what these might say about you. Due Sept 17. Dr. Tracey P. Lauriault, COMM 3409 A 2015, Carleton University https: //doi. org/10. 22215/tplauriault. courses. 2015. comm 3409
- Slides: 52