Managing and Sharing Qualitative Data Delft University January














































- Slides: 46
Managing and Sharing Qualitative Data Delft University, January 28, 2019 Sebastian Karcher (Qualitative Data Repository)
Agenda 1. Data Management and Data Management Planning basics 2. Sharing Qualitative Data 3. Transparency and Qualitative Data
What Is QDR? • Online since 2014: qdr. syr. edu, NSF funded QDR curates, stores, preserves, publishes, and enables the download of digital data generated through qualitative and multi-method research in the social sciences. • HQ at Syracuse; other team members at Georgetown and UW Seattle • Originated in political science: International and interdisciplinary • Currently 50 data projects published
Managing Qualitative Data
Definitions Research data managementis caring for, facilitating access to, preserving and adding value to research data throughout its lifecycle. Source: University of Edinburgh Information Services A data management plan (DMP) helps researchers consider during the research design and planning stage, how the data will be managed during the research process itself and potentially shared afterwards with the wider research community.
Why manage research data well ? • Your data creation is likely to be expensive • Your data underpin your published findings • Good quality data = good quality research • Protect your data from loss, destruction • Compliance with ethical codes, data protection laws, journal requirements, funder policies • To benefit your future self
Topics of a DMP • • • Kinds of data that are being created Any applicable data sharing policies File formats Data descriptions, standards & metadata Data storage Access and use, incl. appropriate restrictions Intellectual property ownership / copyright Human participant constraints Roles and responsibilities in a team Budget for data activities DMP Checklist will be handed out
There’s an App for That https: //dmptool. org
Start with the basics… https: //xkcd. com/1459/ http: //phdcomics. com/comics. php? f=1531
Why document your data and processes? • Enables you to understand data when you return to them • To make data and research understandable to others, i. e. reusable and verifiable • Helps avoid incorrect use/misinterpretation • Data documentation is critical for sharing the data via a repository in order to: • Supplement a data collection with documents such as user guide(s) and data listing • Ensure accurate processing and archiving • Create a catalog record for a published data collection Guiding question: If using your data for the first time, what would a new user need to know to make sense of it?
What should be captured? • Project-level documentation • Inventory of files • Relationships between those files • Units, records, cases… • File-level documentation • Date • Location • Details (interviewee, citation, etc. ) • Variable-level documentation • Labels, codes, classifications • Missing values • Derivations and aggregations • Data confidentiality, access and use conditions • De-identification carried out (de-identification protocol) • Participant consent and copyright conditions/forms/procedures • Access or use conditions of data
Consider documentation early on • Good documentation and metadata depends on what you can provide • What you can provide depends on what you can remember • Start gathering meaningful information from as early on in the research process as possible
Project-level documentation Example: data list • Data listing provides an at-a-glance summary of data files
File-level documentation suggestions Embed documentation in your data files • Interview transcript speech demarcation (speaker tags) Don’t use Excel for quantitative data • Document header with brief details of interview date, place, interviewer name (unless trying to keep de-identified), interviewee details, context • Stata/R/SPSS: variable attributes documented in Variable View (label, code, data type, missing values) • Excel: document properties, worksheet labels (where multiple) https: //doi. org/10. 1186/s 13059 -016 -1044 -7
In practice: Documentation in transcript Source: https: //doi. org/10. 5064/F 6 MS 3 QNV
Metadata – data about data • Highly structured documentation • Data collection metadata examples: • Components of a bibliographic reference • Core information that a search engine indexes to make the data findable • International standards/schemes • Data Documentation Initiative (DDI) • Dublin Core
Excerpt from QDR catalog record metadata
Stuff happens: Research nightmares Your can lose your data in various ways (one at least gives you a good story…) Source: http: //graphics-unleashed. com/2015/04/smart-warningpredicts-imminent-hard-disk-failure/ Source: lilysussman. wordpress. com
Backing-up data • It’s not a case of if you will lose data, but when you will lose data! • Digital media are particularly fallible • Keep additional backup copies • Rule #1: 3 versions in 2 locations • Rule #2: Regular, automatic, incremental • Check that backups work; copy data files to new media every 2 -5 years • Protect against: software failure, hardware failure, malicious attacks, natural disasters, Yourself!
Cloud storage services • Online or ‘cloud’ services increasingly popular • • Drop. Box, Box. com, One. Drive, Google Drive etc. Very convenient Background syncing Mobile apps available • Use, but use with care: • Consider if appropriate, as services can be hosted outside your country (personal data laws) • Encrypt anything sensitive (e. g. : Vera. Crypt) or • Look for services with end-to-end encryption, aka, “zero knowledge” • Often paid; did you budget for that? • Your IT Department may have rules & services for this
Archiving Internet Sources • Online sources may go away: • E. g. , more than half of the reproducibility links in articles from the American Political Science Review between 2000 and 2013 couldn’t be accessed in 2016 (Gertler and Bullock 2017, 167). • Saving local copies is good but insufficient for transparency. • Also: often cannot be shared through repository because of copyright • Use internet archiving services (especially the Internet Archive: https: //archive. org/web/ )
Exercise: Internet Archives The list of the broken links that Gertler and Bullock found in the American Political Science Review is available (in a data repository): http: //bit. do/broken-links • Download the file, open it in Excel, and find the first five links marked as “Did not find resource”. • What sorts of websites are these? Can you still find the linked content using the Internet Archive’s Way Back Machine? (https: //archive. org/web/ )
Data destruction Beware of mandates to destroy the data but, if required, keep the following in mind: • When you delete a file from a hard drive, it’s still retrievable – even after emptying the recycle bin • Files need to be overwritten (ideally multiple times) with random data to ensure they are irretrievable • Free file and folder-shredding software
Sharing Qualitative Data
How to share data ethically/ legally • Obtain informed consent, including explicitly for data sharing and long-term preservation / curation • Protect identities, e. g. , not collecting personal details when unnecessary during data collection; de-identification after the fact • Regulate access where needed (all or part of data), e. g. , by group, use, time period • Securely store personal or sensitive data (separately)
Planning is key! (again) • Collecting identifying information • Avoid collecting unless necessary • Where confidential: Keep directly identifying info separate and secure • Informed consent – an active process • • Be careful with restrictions in consent script Oral vs. written consent Cultural context Ask for permission for data sharing explicitly (& include in IRB application)
Exercise Informed Consent • See handout (15 -20 mins)
Data Sharing and Copyright COPYRIGHT – an intellectual property right assigned automatically to the creators of “original works of authorship” (title 17, U. S. Code ), which prevents unauthorized copying and publishing of an original product Who owns copyright? Copyright and research materials Interviews and copyright Clearing copyright – before reproduction, sharing, SDA Data repositories hold no copyright
Sharing Data in a Repository • • Stable links (Digital Object Identifiers – DOIs) Longterm digital preservation Institutional Requirements Can help you with sharing data well (curation) Makes data more visible/easier for others to discover, access, cite Interoperability across disciplines Access Controls
Access Controls (QDR)
De-identifying qualitative data • Removing or replacing information in text can distort data, make them unusable, unreliable or misleading: A balance to preserve context • Remove direct identifiers, or replace with pseudonyms – often not essential research info • Avoid blanking out; use pseudonyms or replacements (Identify replacements) • Plan or apply editing at time of transcription • Consistency within research team /project • Keep de-identification log of all replacements or removals made; keep separate from anonymized data files
De-Identification Requires Context Expertise
Deidentification Exercise • How would you De-identify the text on the handout? • 15 mins, feel free to discuss with neighbor
De-identification: A Solution I was born in Philadelphia. My parents were both born and raised in Philadelphia. My father, [Michael Rosenzweig], was Jewish and my mother, [Maria Kelly], was Irish Catholic. They both lived in South Philadelphia [. . . ], and there was no chance that they would meet each other. Back in those days, and even when I was growing up, Philadelphia was a city of great ethnic divides, where the Italian, the Jewish, the Irish, the Polish, the black community, and—to the extent there was a Hispanic community— the Hispanic community each lived in their own neighborhood(s) with very little interaction. They both went to the University of Pennsylvania, but didn’t meet there. They met later on. They were both working in public assistance as social workers when they got married. The biggest thing was that back in those days an Irish Catholic was not very welcome in a Jewish family, and a Jew was not very welcome in an Irish Catholic family, so it was interesting growing up with these two ethnic backgrounds. At Penn, my mother was president of her sorority and was a big person on campus. Interesting point, at that point the Daily Pennsylvanian, even though women had been there for a number of years, never had a woman’s name in the newspaper. Even though they were students there, they were never mentioned. My mother went to [a girls high school] in South Philadelphia. My father went to [a co-ed high school in South Philadelphia], and then went to Penn on a [sports] scholarship. [description of his role on the team] He thought he may have been one of the first Jews to play in the Ivy League. He played and started his first year, but he hurt his knee and lost his scholarship—which is what they did back then. His picture with his team [from the 1930 s] is on the wall [at Penn]. He went back and earned a degree in fine arts at Penn. He then taught art in the city schools, and then returned to Penn and earned a master’s degree in social work. He spent his career in social work and especially helping children. He finished his career as [working] for the City of Philadelphia. My mother worked in a number of social work jobs and later was a teacher in the Philadelphia City Schools. In [the 1940 s] my parents had the first of my [. . . ] wonderful sisters, [Martha], who we called [by a short version of that name].
Transparency and Qualitative Data
Making Research Transparent Making research transparent requires: • Data access [DRAW INFO / GENERATE DATA] o What data were used, where are they, are they available? o If you generate your own data, share them or say why you cannot. • Production transparency[DRAW INFO / GENERATE DATA] o If authors’ own data, how were they produced? o Requires providing documentation describing how the data were generated/collected • Analytic transparency[ANALYSIS] o How were data analyzed to arrive at the conclusion? o How are the evidence and claims are connected?
Heterogeneity and Priorities • Universal but not homogeneous • Principles underlying transparency offer choices with regard to implementation • Scholars’ first obligation remains answering compelling and critical questions • Unintended consequences • Ideas are in motion and many questions remain
Rarity of Transparency in Qual. Research • Challenges associated w/sharing qual data • More work for qualitative scholars (? ) • “Implicit” nature of qualitative methods • Incentive problems • No transparency “techniques” • QUESTION: What else might explain why transparency in rare in qualitative research?
Transparency techniques - examples • Bleich and Pekannen: Interview Methods • Sharing CAQDAS (Nvivo, atlas. ti) outputs • Active Citation / Annotation for Transparent Inquiry
Quantitative Research: Matrix Data Open Science Toby Bolsen, Thomas J. Leeper, and Matthew Shapiro. 2014. “Doing What Others Do: Norms, Science, and Collective Action on Global Warming. ” American Politics Research 42(1): 65– 89.
Qualitative Research: Granular Data
ATI in Practice http: //bit. do/qdr-ati-omahoney
Exercise: Evaluating Annotations (1) Read through this short section: Musgrave, Paul, and Daniel H. Nexon. 2018. “Defending Hierarchy from the Moon to the Indian Ocean: Symbolic Capital and Political Dominance in Early Modern China and the Cold War. ” International Organization. https: //doi. org/10. 1017/S 0020818318000139 http: //bit. do/musgrave • Read the first 2 paragraphs under the “Project Apollo” heading, from. At “ the dawn of the Cold War” to “scientific capital was exchanged into prestige ” Given what you have learned about ATI, which passages would you expect to see annotated and with what content? (8 mins)
Exercise: Evaluating Annotations (2) Now look at the same passages as annotated by the authors: http: //bit. do/qdr-ati-musgrave (Select “Cambridge Core ATI at the top right) • How do the authors’ annotations differ from your expectations? • Why do you think they differ? • How do they affect your assessment of the underlying claims?
Ongoing Questions and Challenges • • Responsibility Incentivizing Accommodating heterogeneity Identifying exceptions Timing Enforcement What other challenges and questions do you see?
Questions? Comments? Please stay in touch: https: //qdr. syr. edu @adam 42 smith (Sebastian) @qdrepository (QDR) Email: skarcher@syr. edu qdr@syr. edu