RESEARCH DATA MANAGEMENT VEERLE VAN DEN EYNDEN LIBBY
RESEARCH DATA MANAGEMENT ………………………………………………………………. . ………………………………. …. . . VEERLE VAN DEN EYNDEN & LIBBY BISHOP UK DATA ARCHIVE UNIVERSITY OF ESSEX ………………………. . ……. University of Exeter, Methodology and Research Skills in Sociology 29 November 2012
UK DATA ARCHIVE ………………………………………………………………. . • the UK Data Archive has over forty years experience in selecting, ingesting, curating and providing access to social science data • we have huge experience of supporting researchers and data creators of social science data and related disciplines • we do data sharing for the ESRC Data Policy (since 1995) and the Rural Economy and Land Use programme (2004 -2012) • our best practice approaches to making data shareable are based on: • challenges faced by researchers to share data • handling research data – quantitative and qualitative • highly skilled staff comprising researchers, technical and information specialists www. data-archive. ac. uk ……………………………………………………………………. . … UK DATA ARCHIVE
OUR MANAGING AND SHARING DATA RESOURCES ………………………………………………………………. . Managing and sharing guidance • sections • references • training programme www. data-archive. ac. uk/create-manage www. data-archive. ac. uk/media/2894/managingsharing. pdf Training resources: • presentations • exercises and discussions / answers www. data-archive. ac. uk/create-manage/training-resources ……………………………………………………………………. . … UK DATA ARCHIVE
OPEN EXETER ………………………………………………………………. . Policy development • PGR policy and researcher policy: research data should be made available on Open Access when legally, commercially and ethically appropriate • Guidance on development of discipline-specific/research group level policies Data repository • Pilot stage – solution to problem of big data • Will be merged with ERIC (and renamed) so publications and data can be linked Training and guidance: • 7 courses on Researcher Development Programme • Data management planning sessions • Guidance website • Ad hoc guidance/discipline-specific training ……………………………………………………………………. . … UK DATA ARCHIVE
OVERVIEW FOR TODAY ………………………………………………………………. . • • Data management planning Documenting and contextualising your data Formatting and organising data Storing your data, including data security, data transfer, encryption and file sharing Data confidentiality, legal and ethical issues Anonymisation Data copyright Re-using data ……………………………………………………………………. . … UK DATA ARCHIVE
BENEFITS OF MANAGING AND SHARING YOUR DATA ………………………………………………………………. . DATA CREATED FROM RESEARCH ARE VALUABLE RESOURCES THAT CAN BE USED AND RE-USED FOR FUTURE SCIENTIFIC AND EDUCATIONAL PURPOSES. SHARING DATA FACILITATES NEW SCIENTIFIC INQUIRY, AVOIDS DUPLICATE DATA COLLECTION AND PROVIDES RICH REAL-LIFE RESOURCES FOR EDUCATION AND TRAINING
DATA LIFECYCLE & DATA MANAGEMENT PLANNING ………………………………………………………………. . A DATA MANAGEMENT AND SHARING PLAN HELPS RESEARCHERS CONSIDER: WHEN RESEARCH IS BEING DESIGNED AND PLANNED, HOW DATA WILL BE MANAGED DURING THE RESEARCH PROCESS AND SHARED AFTERWARDS WITH THE WIDER RESEARCH COMMUNITY AREAS OF COVERAGE • Data management planning why & how and the research lifecycle • Data management checklist • Roles and responsibilities • Costing data management
WHY DMP ? ………………………………………………………………. . • Research funders require planning for data management and data sharing, e. g. UK Research Councils • • • which data how manage how share, preserve, curate rights to access, use, …. roles & responsibilities DCC: UK research funders' DMPS expectations • Research benefits • think what to do with research data, how collect, how look after • keep track of research data (e. g. staff leaving) • identify support, resources, services needed • plan storage, short & long-term • plan security, ethical aspects • be prepared for data requests (Fo. I, funder) ……………………………………………………………………. . … UK DATA ARCHIVE
HOW ………………………………………………………………. . • Funder template for DMP • • • ESRC DMP requirements in data policy and DMP guidance MRC DMP guidance and template AHRC technical appendix requirements • DCC’s DMPonline tool • UK Data Archive data management checklist • Uo. E support: Open Access and Data Curation team ……………………………………………………………………. . … UK DATA ARCHIVE
Agree data & metadata ………………………………………………………………. . templates/ organisation Sign off consent form DATA LIFE CYCLE Shared data sharing protocols Licensing, terms and conditions for sharing, formal documentation Data formats, data migration ……………………………………………………………………. . … UK DATA ARCHIVE
EXAMPLE: HEALTH AND SOCIAL CONSEQUENCES OF THE FOOT AND MOUTH DISEASE EPIDEMIC IN NORTH CUMBRIA, 2001 -2003 (SN 5407) ………………………………………………………………. . Data re-used in study: ‘Assessment of Knowledge Sources in Animal Disease Control’ Transcripts and user guide available from UKDA Research design Consent for participation & primary data use Participants keep diaries Interviews recorded Transcripts and recordings archived at UKDA (RTF, MP 3) Catalogue record created User guide created Interviews transcribed Diaries transcribed (MS Word) Data archiving discussed with participants. Consent to archive transcripts and recordings obtained ……………………………………………………………………. . … UK DATA ARCHIVE
ROLES & RESPONSIBILITIES ………………………………………………………………. . Assign, not presume roles or responsibilities for data management Who? • PI • Research staff / students - collecting, creating, processing, analysing data • External contractors - data collection, collation, processing; e. g. transcribers • Support staff - managing, administering research • Exeter IT or local CDO - data storage, security, back-up services • External/institutional data centres / archives - data sharing • Open Access and Data Curation team – advice and support ……………………………………………………………………. . … UK DATA ARCHIVE
COSTING ………………………………………………………………. . • Cost data management and sharing into research • Identify resources needed to make research data shareable beyond primary research team - above planned standard research procedures and practices • Resources = people, equipment, infrastructure, tools to manage, document, organise, store and provide access to data • Early planning can reduce costs • See our data management costing tool ……………………………………………………………………. . … UK DATA ARCHIVE
DOCUMENTING AND CONTEXTUALISING YOUR DATA ………………………………………………………………………………………………………………………………. . A CRUCIAL PART OF MAKING DATA USER-FRIENDLY, SHAREABLE AND WITH LONG-LASTING USABILITY IS TO ENSURE THEY CAN BE UNDERSTOOD AND INTERPRETED BY ANY USER. THIS REQUIRES CLEAR AND DETAILED DATA DESCRIPTION, ANNOTATION AND CONTEXTUAL INFORMATION AREAS OF COVERAGE • Documenting data • Study-level documentation and context • Data-level documentation ……………………………………………………………………. . … UK DATA ARCHIVE
WHY DOCUMENT YOUR DATA? ………………………………………………………………. . • Enables you to understand/interpret data • Needed to make data independently understandable • Ensures informed and correct use, reduces chance of incorrect use/misinterpretation • If using your data for the first time, what would you need to know? • The UK Data Archive and other repositories use data documentation to: • create/supplement catalogue record for dataset • create user guide(s) and data listing for dataset • ensure accurate processing and archiving ……………………………………………………………………. . … UK DATA ARCHIVE
WHAT SHOULD BE CAPTURED? ………………………………………………………………. . Wider contextual information about project and data • • background, project history, aims, objectives, hypotheses publications based on dataset Data collection methodology and processes • • sampling data collection process - fieldwork, interviewer instructions instruments used - questionnaires, showcards, interview schedules temporal/geographic coverage data validation - cleaning, error-checking derived variables – compilation weighting: factors and variables, weighting process secondary data sources used Useful documents are: • final report, published reports, user guide, working paper, publications, lab books ……………………………………………………………………. . … UK DATA ARCHIVE
WHAT SHOULD BE CAPTURED? ………………………………………………………………. . Information on dataset structure • data files • relationships between files • records, cases… Variable-level documentation • labels, codes, classifications • missing values • derivations and aggregations ……………………………………………………………………. . … UK DATA ARCHIVE
WHAT SHOULD BE CAPTURED? ………………………………………………………………. . Data confidentiality, access and use conditions • anonymisation carried out • aggregation, banding, coding and top-coding, disclosure control • editing of sensitive material in interview transcripts • consent conditions/procedures • access or use conditions of data ……………………………………………………………………. . … UK DATA ARCHIVE
LABELLING ………………………………………………………………. . • All structured, tabular data should have cases or records and variables adequately documented with • Names, labels and descriptions for all variables, fields, records and their values • Variable names • • question number system related to questions in a survey/questionnaire e. g. Q 1 a, Q 1 b, Q 2, Q 3 a numerical order system e. g. V 1, V 2, V 3 meaningful abbreviations or combinations of abbreviations referring to meaning of the variable e. g. oz%=percentage ozone, GOR=Government Office Region, moocc=mother occupation, faocc=father occupation for interoperability across platforms - variable names max 8 characters without spaces (absolute maximum is 32 characters) ……………………………………………………………………. . … UK DATA ARCHIVE
CODE LABELLING ………………………………………………………………. . • Code labels • brief, max. 80 characters • unit of measurement • reference the question number of a survey or questionnaire e. g. variable 'p 1 sex' = 'sex of respondent' with codes '1=female', '2=male', '8=don't know', '-9=not answered’ e. g. variable 'q 11 hexw' with label 'Q 11: hours spent taking physical exercise in a typical week' - the label gives the unit of measurement and a reference to the question number (Q 11 b) • Codes of, and reasons for, missing data • avoid blanks, system-missing or '0' values e. g. '99=not recorded', '98=not provided (no answer)', '97=not applicable', '96=not known', '95=error' • Coding or classification schemes used, with a bibliographic ref e. g. Standard Occupational Classification 2000 - a list of codes to classify respondents' jobs; ISO 3166 alpha-2 country codes - an international standard of 2 -letter country codes ……………………………………………………………………. . … UK DATA ARCHIVE
DERIVED VARIABLES ………………………………………………………………. . Information about derived or constructed variables from original data: • the logic of each derivation should be made clear • for simple derivations, such as grouping age data into age intervals, variable and value labels can be used to explain them • complex derivations can be described by providing the algorithms, logical statements or functions used to create derived variables, e. g. the SPSS or Stata command files ……………………………………………………………………. . … UK DATA ARCHIVE
DATA-LEVEL DOCUMENTATION ………………………………………………………………. . • Embed annotations in data files: • quantitative data: variable/value labels; worksheet information; table relationships and queries in relational database; GIS data layers/tables • Examples (see visual screenshots): • SPSS: variable attributes documented in Variable View (label, code, data type, missing values) • MS Access: variable descriptions and attributes documented in Design View; relationships • Arc. GIS: shapefiles (layers) and tables in geodatabase; metadata created in Arc. Catalog • MS Excel: base worksheet data-related documentation ……………………………………………………………………. . … UK DATA ARCHIVE
DATA-LEVEL DOCUMENTATION ………………………………………………………………. . • Qualitative data/text documents: • interview transcript speech demarcation (speaker tags) • document header with brief details of interview date, place, interviewer name, interviewee details, context • data listing of attributes for interviewees ……………………………………………………………………. . … UK DATA ARCHIVE
EXAMPLES OF DOCUMENTATION FOR RE-USE ………………………………………………………………. . • Quantitative dataset • documentation - questionnaire, variable list, codebook etc. • Qualitative dataset – depends on size and scale • user guide, data listing ……………………………………………………………………. . … UK DATA ARCHIVE
QUANTITATIVE STUDY ………………………………………………………………. . Smaller-scale study - user guide may just contain survey questionnaire, methodology information Example from Health Survey for England 2007 – documents separated, bigger study ……………………………………………………………………. . … UK DATA ARCHIVE
QUALITATIVE STUDY ………………………………………………………………. . User guide contains variety of documents that provide context ……………………………………………………………………. . … UK DATA ARCHIVE
QUALITATIVE STUDY ………………………………………………………………. . Data listing provides an at-a-glance summary of data collection ……………………………………………………………………. . … UK DATA ARCHIVE
FORMATTING YOUR DATA ………………………………………………………………. . USING STANDARD AND INTERCHANGEABLE OR OPEN LOSSLESS DATA FORMATS ENSURES LONG-TERM USABILITY OF DATA. HIGH QUALITY DATA ARE WELL ORGANISED, STRUCTURED, NAMED AND VERSIONED AND THE AUTHENTICITY OF MASTER FILES IDENTIFIED. AREAS OF COVERAGE • File formats • File conversions • Organising files and folders • File naming • Version control and authenticity
CAN YOU UNDERSTAND/USE THESE DATA? ………………………………………………………………. . Srv. Mthd. Draft. doc Srv. Mthd. Final. doc Srv. Mthd. Last. One. doc Srv. Mthd. Real. Version. doc ……………………………………………………………………. . … UK DATA ARCHIVE
FILE FORMATS ………………………………………………………………. . Choice of software format for digital data: • planned data analyses • software availability • hardware used – e. g. audio • discipline-specific standards and customs Digital data = software dependent Digital data endangered by obsolescence of software/ hardware Best formats for long-term preservation - standard formats, interchangeable formats, open formats e. g. tab-delimited, comma-delimited (CSV), ASCII, RTF, PDF/A, Open. Document format, SPSS portable, XML UK Data Archive optimal file formats for various data types ……………………………………………………………………. . … UK DATA ARCHIVE
FILE FORMAT CONVERSIONS ………………………………………………………………. . Convert data for preservation or back-up: export, save as Beware of conversion errors losses: • • • loss of internal metadata e. g. convert MS Access to tab-delimited tables loss of editing, formatting, formulae e. g. convert MS Word to RTF truncation or loss of data e. g. string variables lost in SPSS – Stata conversion; MS Access memo fields truncated in conversion to CSV Check for errors and changes after conversion ……………………………………………………………………. . … UK DATA ARCHIVE
EXAMPLE: FORMAT CONVERSION ………………………………………………………………. . MS Excel format Loss of annotation Tab–delimited text format ……………………………………………………………………. . … UK DATA ARCHIVE
ORGANISING DATA ………………………………………………………………. . Plan in advance how best to organise data Examples • hierarchical structure of files, grouped in folders, e. g. images • survey data – spreadsheet, SPSS, relational database • interview transcripts - individual well-named files ……………………………………………………………………. . … UK DATA ARCHIVE
FILE NAMING ………………………………………………………………. . • • file name = principal identifier of file logical naming - easy to identify, locate, retrieve, access naming provides organisation, context & consistency name elements: version nr, date, content description, creator name Best practice • • name independent of location brief & relevant no special characters, dots or spaces for separation use underscores _ versioning via filename: ordinal and decimal version numbers use names to classify broad types of files avoid very long file names ……………………………………………………………………. . … UK DATA ARCHIVE
VERSION CONTROL ………………………………………………………………. . Keep track of different copies or versions of data files Which method: • single site vs. across locations • single vs. multiple users • different versions to be stored vs. files to be synchronised Best practice: • unique identifiers for files (file names) • record file status/versions • record relationships between files e. g. data file and documentation; similar data files • keep track of file locations e. g. laptop vs. PC ……………………………………………………………………. . … UK DATA ARCHIVE
VERSION CONTROL ………………………………………………………………. . Single user of data files • file naming – unique file names with date or version number (avoid spaces!) e. g. Food. Interview_1_draft; Food. Interview_1_final; Health. Test_06 -04 - 2008; BGHSurvey. Procedures_00_04 • version control table or file history within or alongside data file • version control facility within software e. g. MS WORD ……………………………………………………………………. . … UK DATA ARCHIVE
VERSION CONTROL ………………………………………………………………. . Multiple users of data files • control rights to file editing: read/write permissions e. g. Windows Explorer • versioning/file sharing software: check files out/in e. g. SVN, VSS, Google Docs, Amazon S 3 • manual merging of multiple entries/edits Synchronise files e. g. MS Sync. Toy software ……………………………………………………………………. . … UK DATA ARCHIVE
STORING YOUR DATA ………………………………………………………………. . LOOKING AFTER RESEARCH DATA FOR THE LONGERTERM AND PROTECTING THEM FROM UNWANTED LOSS REQUIRES HAVING GOOD STRATEGIES IN PLACE FOR SECURELY STORING, BACKING-UP, TRANSMITTING, AND DISPOSING OF DATA. COLLABORATIVE RESEARCH BRINGS CHALLENGES FOR THE SHARED STORAGE OF, AND ACCESS TO, DATA. AREAS OF COVERAGE • Making back-ups • Data storage • Data security • Data transmission and encryption • File sharing and collaborative environments • Data disposal • Disseminating data
BACKING-UP DATA ………………………………………………………………. . • Why do back-ups? Risk of loss and change - would your data survive a disaster? • Protect against: software failure, hardware failure, malicious attack, natural disasters • Back-ups are additional copies that can be used to restore originals • It’s not backed-up unless backed-up with a strategy ……………………………………………………………………. . … UK DATA ARCHIVE
BACK-UP STRATEGY ………………………………………………………………. . Consider • what’s backed-up? - all, some, just the bits you change? • where? - original copy, external local and remote copies • what media? - CD, DVD, external hard drive, tape, etc. • how often? – assess frequency and automate the process • for how long is it kept? • verify and recover - never assume, regularly test a restore Backing-up need not be expensive • 1 Tb external drives are around £ 50, with back-up software Data backup at Exeter ……………………………………………………………………. . … UK DATA ARCHIVE
DATA STORAGE ………………………………………………………………. . All digital media are fallible File formats and physical storage media become obsolete • • optical (CD, DVD) and magnetic media (hard drive, tapes) degrade never assume the format will be around for ever Best practice • • use data formats with long-term availability storage strategy - at least two forms of storage and locations maintain original copy, external local copy and external remote copy data files to new media two to five years after first created check data integrity of stored data files regularly (checksum) know your personal/institutional back-up strategy know data retention policies that apply: funder, publisher, home institution what to protect? not only data, and not only digital Storage at Exeter: http: //as. exeter. ac. uk/it/files/udrive/ Uo. E Code of Practice: Good Practice in the Conduct of Research ……………………………………………………………………. . … 43 UK DATA ARCHIVE
NON-DIGITAL STORAGE ………………………………………………………………. . Printed materials, photographs • degradation from sunlight and acid (sweat on skin, in paper) • use high quality media for long-term storage/preservation e. g. using acid-free paper & boxes, non-rust paperclips (no staples) Confidential items, e. g. signed consent forms, interview notes • store securely, behind lock • separate from data files Uo. E Code of Practice: Good Practice in the Conduct of Research ……………………………………………………………………. . … 44 UK DATA ARCHIVE
ENCRYPTION ………………………………………………………………. . Always encrypt personal or sensitive data Encrypt anything you would not send on a postcard • for moving files e. g. transcripts • for storing files e. g. shared areas, mobile devices Basic principles • use an algorithm to transform information (A=1) • need a ‘key’ to decrypt Free softwares that are easy to use • Safehouse • Truecrypt • Axcrypt These softwares • encrypt hard drives, partitions, files and folders • encrypt portable storage devices such as USB flash drives Truecrypt encryption at Exeter ……………………………………………………………………. . … 45 UK DATA ARCHIVE
DATA DESTRUCTION ………………………………………………………………. . When you delete data and documentation from a hard drive: it is probably not gone • files need to be overwritten to ensure they are irretrievably deleted: • BCWipe - uses ‘military-grade procedures to surgically remove all traces of any file’ • Axcrypt • if in doubt, physically destroy the drive using an approved secure destruction facility • physically destroy portable media, as you would shred paper Guidance from Uo. E Records Manager ……………………………………………………………………. . … UK DATA ARCHIVE
DATA SECURITY ………………………………………………………………. . Protect data from unauthorised access, use, change, disclosure and destruction Personal data need more protection – always keep separate Control access to computers • • • passwords anti-virus and firewall protection, power surge protection networked vs non-networked PCs all devices: desktops, laptops, memory sticks, mobile devices all locations: work, home, travel restrict access to sensitive materials e. g. consent forms, patient records Proper disposal of equipment (and data) • even reformatting the hard drive is not sufficient Control physical access to buildings, rooms, cabinets Data security at Exeter ……………………………………………………………………. . … 47 UK DATA ARCHIVE
FILE SHARING & COLLABORATIVE ENVIRONMENTS ………………………………………………………………. . Sharing data between researchers and teams • too often email attachments • Yousendit, Dropbox – consider if appropriate as services can be hosted outside the EU (DPA for personal data) • Virtual Research Environments • MS Share. Point • Sakai • file transfer protocol (ftp) • physical media ……………………………………………………………………. . … 48 UK DATA ARCHIVE
DISSEMINATING DATA ………………………………………………………………. . • Usually a funder requirement • Will be a University requirement • Store in a repository: • Exeter’s data repository • A subject repository • A national repository (e. g. UK Data Archive) Advantages: Ø Security. Ø Permanence. Ø Visibility. Ø Citability. Ø Opportunity. Ø Someone else looks after it for you. List of repositories at Open. DOAR ……………………………………………………………………. . … 49 UK DATA ARCHIVE
DEMO: DATA ENCRYPTION ………………………………………………………………. . Create an encrypted storage space using free software Safe. House www. data-archive. ac. uk/media/312652/storingyourdata_encryptionexercise. pdf ……………………………………………………………………. . … UK DATA ARCHIVE
DEMO: DATA INTEGRITY & BACK-UP ………………………………………………………………. . Calculate the MD 5 checksum value of a file to check its integrate, e. g. after back-up www. data-archive. ac. uk/media/361550/storingyourdata_checksumexercise. pdf ……………………………………………………………………. . … UK DATA ARCHIVE
DEMO: VERSIONING ………………………………………………………………. . Keep track of different versions of documents in MS Word www. data-archive. ac. uk/media/375814/formattingyourdata_versioningexercise. pdf ……………………………………………………………………. . … UK DATA ARCHIVE
DEMO: SYNCHRONISING ………………………………………………………………. . Synchronise files between two folders using Sync. Toy software www. data-archive. ac. uk/media/375817/formattingyourdata_synchronisingexercise. pdf ……………………………………………………………………. . … UK DATA ARCHIVE
DEMO: N-VIVO DATA MANAGEMENT ………………………………………………………………. . Data handling & management in NVivo 9 http: //datalib. edina. ac. uk/mantra/nvivomodule. html Import • Word files, formatting (heading style for topics, speaker tags – auto-create nodes), no headers/footers • create folder structure: internals, externals, memos e. g. according to data type; documentation in memos • create memos to document analysis • event log recording all actions • classifications: interviewees, interviews, coding e. g. interviewee attributes, incl. consent (data list) Export • Data files, with related content (gets included in text files) • Documentation: - textual: memos, log files, coding - spreadsheets: classifications, event log, objects structure ……………………………………………………………………. . … - summary extract reports UK DATA ARCHIVE
DEMO: SPSS DATA MANAGEMENT ………………………………………………………………. . Data handling in SPSS 18 http: //datalib. edina. ac. uk/mantra/spssmodule. html ……………………………………………………………………. . … UK DATA ARCHIVE
ETHICAL AND LEGAL ISSUES IN DATA SHARING ………………………………………………………………. . A COMBINATION OF GAINING CONSENT FOR DATA SHARING, ANONYMISING AND REGULATING ACCESS TO DATA WILL INCREASE THE POTENTIAL FOR MAKING PEOPLERELATED RESEARCH DATA MORE READILY AND WIDELY AVAILABLE AREAS OF COVERAGE • Legal and ethical aspects • Informed consent for data sharing • Anonymising data • Controlling access to data
ETHICAL ARGUMENTS FOR ARCHIVING DATA ………………………………………………………………. . • store and protect data securely • not burden over-researched, vulnerable groups • make best use of hard-to-obtain data (e. g. , elites, socially excluded, over-researched) • extend voices of participants • provide greater research transparency • enable fullest ethical use of rich data In each, ethical duties to participants, peers and public may be present ……………………………………………………………………. . … UK DATA ARCHIVE
DUTY OF CONFIDENTIALITY AND DATA SHARING ………………………………………………………………. . • Duty of confidentiality exists in common law and may apply to research data • If participant consents to share data, then sharing does not breach confidentiality • Public interest can override duty of confidentiality; best practice is to avoid vague or general promises in consent forms ……………………………………………………………………. . … UK DATA ARCHIVE
DATA PROTECTION ACT, 1998 ………………………………………………………………. . • Personal data: • relate to living individual • individual can be identified from those data or from those data and other information • include any expression of opinion about the individual • Requirements for handling personal data • • processed fairly and lawfully obtained and processed for a specified purpose adequate, relevant and not excessive for the purpose accurate not kept longer than necessary processed in accordance with the rights of data subjects, e. g. right to be informed about how data will be used, stored, processed, transferred, destroyed, …; right to access info and data held kept secure not transferred abroad without adequate protection • Only disclosed if consent has been given to do so (except legal duty) ……………………………………………………………………. . … UK DATA ARCHIVE
DATA PROTECTION ACT & RESEARCH ………………………………………………………………. . • Exceptions for personal data collected as part of research: • can be retained indefinitely (if needed) • can be used for other purposes in some circumstances • people should still be informed • If data are anonymised (personal identifiers removed) then DP laws will not apply as these no longer constitute ‘personal data’ DPA is not intended to, and does not, inhibit ethical research ……………………………………………………………………. . … UK DATA ARCHIVE
SENSITIVE DATA ………………………………………………………………. . • Data regarding an individual's race or ethnic origin, political opinion, religious beliefs, trade union membership, physical or mental health, sex life, criminal proceedings or convictions (DPA 1998) • Can only be processed for research purposes if: • explicit consent (ideally in writing) has been obtained; or • medical research by a health professional or equivalent with duty of confidentiality; or • analysis of racial/ethnic origins for purpose of equal opportunities monitoring; or • in substantial public interest and not causing substantial damage and distress ……………………………………………………………………. . … UK DATA ARCHIVE
OPTIONS FOR SHARING CONFIDENTIAL DATA ………………………………………………………………. . Researchers to consider • obtaining informed consent, also for data sharing and preservation / curation • protecting identities e. g. anonymisation, not collecting personal data • restricting / regulating access where needed (all or part of data) e. g. by group, use, time period • securely storing personal or sensitive data Consider jointly and in dialogue with participants Plan early in research ……………………………………………………………………. . … UK DATA ARCHIVE
INFORMED CONSENT FOR ETHICAL PURPOSES ………………………………………………………………. . • What does it mean for consent to be “informed”? • • • purpose of the research what is involved in participation benefits and risks mechanism of withdrawal data uses – primary research, storing, processing, re-use, sharing, archiving, … • strategies to ensure confidentiality of data where this is relevant – anonymisation, access restrictions… • RCUK also expects data to be accessible for other uses RCUK Policy on Access to Research Outputs • Now a requirement for ESRC awards: “Where research data are considered confidential or contain sensitive personal data, award holders must seek to secure consent for data sharing or alternatively anonymise the data in order to make sharing possible. ” ESRC Research Data Policy 2010 2. 4(32) ……………………………………………………………………. . … UK DATA ARCHIVE
DO PARTICIPANTS CONSENT TO SHARE DATA? ………………………………………………………………. . • Timescapes • data on personal relationships • 95%+ consent rate • Foot and mouth disease in N. Cumbria • sensitive community information • UK Data Archive consultation; pilot with 4 participants • 40/54 interviews; 42/54 diaries; audio restricted • Finnish research on consent • Re-contact project: life stores, gender, etc. • 165/169 (98%) agreed • Bereaved relatives want others to benefit from their data ……………………………………………………………………. . … UK DATA ARCHIVE
INFORMED CONSENT FOR UNKNOWN FUTURE USES ………………………………………………………………. . • In fact, a great deal of information can be provided • • who can access the data – only bona fide researchers purposes – research or teaching or both confidentiality protections, undertakings of future users general consent (similar to consent with emergent research topics) • Medical research and biobank models – enduring, broad, open consent • no time limits; no recontact required • unspecified hypotheses and procedures • 99% consent rate (2500+ patients) – Wales Cancer Bank ESRC expects that others will also use it [data], so consent should be obtained on this basis and the original researcher must take into account the long-term use and preservation of data. (ESRC Framework for Research Ethics, 1. 17. 5. 1) ……………………………………………………………………. . … UK DATA ARCHIVE
CONSENT NEEDED ACROSS THE DATA LIFE CYCLE ………………………………………………………………. . • Engagement in the research process • decide who approves final versions of transcripts • Dissemination in presentations, publications, the web • decide who approves research outputs • Data sharing and archiving • consider future uses of data Always dependent on the research context UK Data Archive model consent form ……………………………………………………………………. . … UK DATA ARCHIVE
A GOOD INFORMATION SHEET & CONSENT FORM ………………………………………………………………. . • Meets requirements of Data Protection laws • purpose of the research • what is involved in participation • benefits and risks • mechanism of withdrawal • usage of data – for primary research and sharing • strategies to ensure confidentiality of data (anonymisation, access, …. ) where this is relevant • Simple • Avoids excessive warnings • Complete for all purposes: use, publishing, sharing ……………………………………………………………………. . … UK DATA ARCHIVE
WHEN TO ASK FOR CONSENT ………………………………………………………………. . One-off Pros Cons Simple Research outputs (even questions, not known in advance) Least hassle of participant Participants will not know all content they will contribute Process Most complete for assuring active consent Might not get consent needed before losing contact Repetitive, can annoy participant ……………………………………………………………………. . … UK DATA ARCHIVE
ANONYMISATION PREVENTS IDENTITY DISCLOSURE ………………………………………………………………. . A person’s identity can be disclosed through: • direct identifiers e. g. name, address, postcode, telephone number, voice, picture Often NOT essential research information (administrative) • indirect identifiers – possible disclosure in combination with other information e. g. occupation, geography, unique or exceptional values (outliers) or characteristics ……………………………………………………………………. . … UK DATA ARCHIVE
KEY POINTS FOR ANONYMISING ………………………………………………………………. . • • • never disclose personal data - unless consent for disclosure reasonable/appropriate level of anonymity maintain maximum meaningful information where possible replace rather than remove identifying information may provide context, do not overanonymise • re-users of data have the same legal and ethical obligation to NOT disclose confidential information as primary users ……………………………………………………………………. . … UK DATA ARCHIVE
ANONYMISING QUANTITATIVE DATA ………………………………………………………………. . • remove direct identifiers e. g. names, address, institution, photo • reduce the precision/detail of a variable through aggregation e. g. birth year vs. date of birth, occupational categories, area rather than village • generalise meaning of detailed text variable e. g. occupational expertise • restrict upper lower ranges of a variable to hide outliers e. g. income, age • combining variables e. g. creating non-disclosive rural/urban variable from place variables ……………………………………………………………………. . … UK DATA ARCHIVE
GEO-REFERENCED DATA ………………………………………………………………. . Spatial references (point coordinates, small areas) may disclose position of individuals, organisations, businesses Remove spatial references - prevents disclosure; also all geographical and related information lost Better • reduce precision - replace point co-ordinates with larger, nondisclosing geographical areas e. g. km 2 area, postcode district, ward, road • reduce precision - replace point coordinate with meaningful variable typifying the geographical position; or summary statistics of location e. g. catchment area, poverty index, population density • keep spatial references and impose access restrictions on data ……………………………………………………………………. . … UK DATA ARCHIVE
ANONYMISING QUALITATIVE DATA ………………………………………………………………. . • not collect disclosive data unless necessary • plan or apply editing at time of transcription except: longitudinal studies - anonymise when data collection complete (linkages) • avoid blanking out; use pseudonyms or replacements • avoid over-anonymising - removing/aggregating information in text can distort data, make them unusable, unreliable or misleading • consistency within research team and throughout project • identify replacements, e. g. with [brackets] • keep anonymisation log of all replacements, aggregations or removals made – keep separate from anonymised data files • xml mark-up can be used for anonymisation <seg type="anonymised">word to be anonymised</seg> ……………………………………………………………………. . … UK DATA ARCHIVE
ACCESS CONTROLS ON DATA ………………………………………………………………. . • Essential when anonymisation ineffective or damaging to quality • visual or audio data • disclosive microdata • UK Data Archive has gradation of access controls • small number of studies are open (no registration) • majority require registration • data users sign legally binding End User Licence – e. g. not identify any potentially identifiable individuals • stricter regulations for certain types of data: • Special Licences • Approved researchers • require data access authorisation from data owner prior to data release • embargo for given time period • Secure Data Service (no direct data access) • Multiple AC can apply to different data types within one study ……………………………………………………………………. . … UK DATA ARCHIVE
………………………………………………………………………………………………………………………………. . COPYRIGHT IS AN INTELLECTUAL PROPERTY RIGHT ASSIGNED AUTOMATICALLY TO THE CREATOR, THAT PREVENTS UNAUTHORISED COPYING AND PUBLISHING OF AN ORIGINAL WORK. COPYRIGHT APPLIES TO RESEARCH DATA AND PLAYS A ROLE WHEN CREATING, SHARING AND RE-USING DATA. ……………………………………………………………………. . … UK DATA ARCHIVE
COPYRIGHT AND DATA SHARING ………………………………………………………………. . • Copyright permissions sought and granted prior to data sharing / archiving • Clearing copyright – reach agreement with copyright holder • Data archives publish data – they hold no copyright • Copyright holders give permission to data archives to preserve data and make them accessible to users • For secondary use, copyright clearance before data can be reproduced • Exception - fair dealing - for non-commercial research, private study, teaching, quotations, criticism or review; then author and source must be cited ……………………………………………………………………. . … UK DATA ARCHIVE
ISSUES IN RE-USING QUALITATIVE DATA ………………………………………………………………. . THE PRACTICE OF RE-USING QUALITATIVE DATA, OR SECONDARY ANALYSIS IS RAPIDLY BECOMING MORE WIDELY ACCEPTED. IT REQUIRES ATTENTION TO SEVERAL AREAS OF COVERAGE • context • sampling (no slides) • risks
THE CONTEXT DEBATE: QUALITATIVE DATA ………………………………………………………………. . An argument against re-using data: • context is essential for interpreting qual data • anyone conducting SA lacks the context known by the primary researcher(s), “head-notes” • thus all SA is inevitably flawed or limited because of this lack of contextual information (Mauthner-several) ……………………………………………………………………. . … UK DATA ARCHIVE
THE CONTEXT DEBATE: RESPONSE ………………………………………………………………. . True, identical context is, by definition, not available as secondary researcher “was not there”, but… • context is ALWAYS partial (transcription to audio to video) • what counts as relevant context depends on the research question (CA) Must distinguish: data v. evidence (Hammersley SRO 2010) • “data as given” v “data as constructed” = evidence • access to context (‘head notes’) may give primary researcher more privileged relationship to some “data as given”, but • does NOT imply privileged relationship to evidence, interpretation (Irwin and Winterton 2011 Timescapes WP 4) Archival practice: provide context and acknowledge its limits ……………………………………………………………………. . … UK DATA ARCHIVE
RISKS IN RE-USING DATA ………………………………………………………………. . • What if another researcher interprets “my” participants’ words differently? • reusers do have different interpretations of both data and predecessors’ analyses (Bornat 03; Savage 05; Hollway&Jefferson 00) • such debates are an essential part of professional discourse • the non-feminist grandmother (Borland ’ 06) • Respecting participants ≠ agreeing with every word they say • Researcher reputations (senior and junior) ……………………………………………………………………. . … UK DATA ARCHIVE
RESEARCH DATA POLICIES ………………………………………………………………. . Uo. E Research Data Management Policies RCUK Common Principles on Data Wellcome Trust Policy Statement Overview of Funders’ Policies on Open Data ……………………………………………………………………. . … UK DATA ARCHIVE
RESOURCES ………………………………………………………………. . • • British Sociological Association [http: //www. britsoc. co. uk/equality/Statement+Ethical+Practice. htm] British Sociological Association - Visual Sociology Group – Ethical guidelines [http: //www. visualsociology. org. uk/about/ethical_statement. php] Methodological Issues in Qualitative Data Sharing and Archiving [http: //www. cardiff. ac. uk/socsi/hyper/QUADS/index. html] National Centre for Research Methods - informed consent project [http: //www. southampton. ac. uk/socsci/sociology/research/projects/informedcontent. htm l] Oral History Society guidelines [http: //www. ohs. org. uk/ethics/] Social Research Association [http: //www. the-sra. org. uk/ethical. htm] The Research Ethics Guidebook [http: //www. ethicsguidebook. ac. uk/ Social Science Research Ethics http: //www. lancs. ac. uk/researchethics ……………………………………………………………………. . … UK DATA ARCHIVE
OTHER USEFUL LINKS ………………………………………………………………. . Open Access and Data Curation Team: openaccess@exeter. ac. uk Open Exeter Project Open Access website RKT Contact Details Digital Curation Centre Appraise & Select Research Data – DCC Exeter IT Governance and Compliance Uo. E ethics policy Download Uo. E research data management survival guide for PGRs Read one Ph. D student’s experience of handling copyright issues ……………………………………………………………………. . … UK DATA ARCHIVE
CASE STUDY ………………………………………………………………. . Health and Social Consequences of the Foot and Mouth Disease Epidemic in North Cumbria, 2001 -2003 (SN 5407 at UK Data Archive) Maggie Mort, Lancaster University • funded by Department of Health • recruit panel of 54 local people in affected area at time of FM crisis: farmers, agricultural professionals, small businesses, health professionals, vets, residents • weekly diaries for 18 months describing how their life was affected by the crisis and process of recovery observed around them (handwritten) • In-depth interviews and group discussions (audio recordings, transcripts) • at end of research – feeling by researchers that data should be archived • how would you approach data archiving in this case? • • ethical aspects legal aspects how engage with panel practical aspects • If this was your research project, which data management aspects would • be essential to consider, and when? use the DM checklist to plan DM activities within a research cycle ……………………………………………………………………. . … UK DATA ARCHIVE
CASE STUDY ………………………………………………………………. . Researchers approach • • • seek advice from copyright specialist re. terms of agreement for archiving meet with UK Data Archive, Qualidata - advice data archiving develop separate consent forms for written and audio material, with opt in / opt out and embargo option • pilot discussion on data archiving with 4 panel members to explore: • feelings re. data anonymisation, confidentiality, copyright, ownership • user options of archived data - scholarly / educational purposes • understanding of archiving by participants and information required • discuss archiving individually with each panel member • 7 panel members declined archiving their data • 40 interview and diary transcripts are archived and available for re-use by registered users • 3 interviews and 5 diaries are embargoed until 2015 • audio files archived and only available by permission from researchers Detailed information: www. esds. ac. uk/doc/5407%5 Cmrdoc%5 Cpdf%5 Cq 5407 userguide. pdf ……………………………………………………………………. . … UK DATA ARCHIVE
CONTACT ………………………………………………………………. . UK DATA ARCHIVE UNIVERSITY OF ESSEX WIVENHOE PARK COLCHESTER ESSEX CO 4 3 SQ ……………. …. T: +44 (0)1206 872001 E: datasharing@data-archive. ac. uk W: www. data-archive. ac. uk ………………. . ……………………………………………………………………. . … UK DATA ARCHIVE
- Slides: 87