RESEARCH DATA MANAGEMENT Open Access and Data Curation

  • Slides: 66
Download presentation
RESEARCH DATA MANAGEMENT Open Access and Data Curation Team With thanks from the UKDA

RESEARCH DATA MANAGEMENT Open Access and Data Curation Team With thanks from the UKDA for allowing us to reuse and adapt some of their training materials

TODAY’S SESSION Introductions Data Management Plans Data Storage Data Sharing Open Access Feedback 2

TODAY’S SESSION Introductions Data Management Plans Data Storage Data Sharing Open Access Feedback 2

INTRODUCTIONS Who are we? Who are you? 3

INTRODUCTIONS Who are we? Who are you? 3

WHY MANAGE DATA? Short-term: Increase efficiency. Save time. Simplify your life. Meet funder and

WHY MANAGE DATA? Short-term: Increase efficiency. Save time. Simplify your life. Meet funder and institutional requirements. Long-term: Preserve your data. Easier sharing and collaboration. Allow others to build on your research. Raise your visibility and research profile. 4

WHAT IS A DATA MANAGEMENT PLAN? “Plans typically state what data will be created

WHAT IS A DATA MANAGEMENT PLAN? “Plans typically state what data will be created and how, and outline the plans for sharing and preservation, noting what is appropriate given the nature of the data and any restrictions that may need to be applied. ” Digital Curation Centre website 5

WHY WRITE A DMP? Many funders now require a DMP as part of the

WHY WRITE A DMP? Many funders now require a DMP as part of the application process Helps the associated project with data management issues Makes the project members think about relevant issues 6

DATA MANAGEMENT PLANNING (DMP) Bids to most major funders now require a DMP outlining:

DATA MANAGEMENT PLANNING (DMP) Bids to most major funders now require a DMP outlining: Roles and responsibilities What data will be created and how Data formats Documentation of data Storage and back up Data sharing Long-term preservation and access. . . Get support from the Open Access & Data Curation Team 7

RCUK COMMON PRINCIPLES ON DATA POLICY The Common Principles are available on the RCUK

RCUK COMMON PRINCIPLES ON DATA POLICY The Common Principles are available on the RCUK website. Open Data Accessible Data Discoverable Data Legal, ethical and commercial considerations should be considered privileged use the data Data use should be acknowledged Public funds can be used to support the management and sharing of publicly-funded research data 8

EXERCISE 9

EXERCISE 9

Helpful links DMPonline DCC policies Funder Information: ESRC MRC policy on research data-sharing MRC

Helpful links DMPonline DCC policies Funder Information: ESRC MRC policy on research data-sharing MRC guidance on data management plans Wellcome Trust Policy on data management and sharing Guidance for researchers: Developing a data management and sharing plan 10

STORING YOUR DATA 11

STORING YOUR DATA 11

OUTLINE: STORING YOUR DATA In this section, we will look at: Data storage Back-up

OUTLINE: STORING YOUR DATA In this section, we will look at: Data storage Back-up File naming Versioning Data security, encryption and destruction Where to find further information Questions throughout A test at the end! 30 minutes 12

WHICH IS THE FINAL VERSION? 13 Image used with permission from the UKDA

WHICH IS THE FINAL VERSION? 13 Image used with permission from the UKDA

DATA STORAGE 1 Where will you be working: at home; in the office; both?

DATA STORAGE 1 Where will you be working: at home; in the office; both? Will you be working collaboratively? U Drive – 20 GBs allowance. Cloud storage (but not for sensitive or confidential data). Computer hard drive. External hard drives & USB sticks. DVDs/CDs. Hard copy of documents. 14

DATA STORAGE 2 File formats and physical storage media become obsolete: All digital media

DATA STORAGE 2 File formats and physical storage media become obsolete: All digital media are fallible: optical (CD, DVD) and magnetic media (hard drive, tapes) degrade. Never assume the format will be around for ever. Storage strategy best practice At least two storage formats. Prefer open or standard formats – e. g. Open. Document Format (ODF), comma-separated values. Some proprietary data formats such as MS Excel are likely to be accessible for a reasonable, but not unlimited, time. Maintain original copy, external local copy and external remote copy. Copy data files to new media two to five years after first created. Check data integrity of stored data files regularly (checksum e. g. Fast. Sum). 15

NON-DIGITAL STORAGE Always follow the procedures stated in your ethical approval Confidential items, e.

NON-DIGITAL STORAGE Always follow the procedures stated in your ethical approval Confidential items, e. g. signed consent forms, interview notes Store securely, behind lock. Separate from data files. Printed materials, photographs Degradation from sunlight and acid (sweat on skin, in paper). Use high quality media for long-term storage/preservation. e. g. using acid-free paper & boxes, non-rust paperclips (no staples). 16

WHY BACK-UP? Back-ups are additional copies that can be used to restore originals. Protect

WHY BACK-UP? Back-ups are additional copies that can be used to restore originals. Protect against: software failure, hardware failure, malicious attack, natural disasters e. g. University of Southampton fire It’s not backed-up unless it’s backed-up with a strategy Backing-up need not be expensive 1 Tb external drive = around £ 50 17

BACK-UP STRATEGY Know your institutional and personal back-up strategy: What’s backed-up? - all, some

BACK-UP STRATEGY Know your institutional and personal back-up strategy: What’s backed-up? - all, some data? Where? - original copy, external local and remote copies What media? - CD, DVD, external hard drive, tape, etc. How often? – assess frequency and automate the process For how long is it kept? Verify and recover - never assume, regularly test a restore Make sure you know which version is the most up to date. . . 18

FILE NAMING File name = principal identifier of file Easy to: identify, locate, retrieve,

FILE NAMING File name = principal identifier of file Easy to: identify, locate, retrieve, access Provides context e. g. : ü version number e. g. Food. Interview_1. 1 ü date e. g. Health. Test_2011 -04 -06 ü content description e. g. BGHSurvey. Procedures ü creator name e. g. Comms. Plan. HLJ 19

FILE NAMING: BEST PRACTICE Brief and relevant No special characters, dots or spaces For

FILE NAMING: BEST PRACTICE Brief and relevant No special characters, dots or spaces For separation use underscores _ Name independent of location Date: YYYY_MM_DD Have a System! Consistent and logical naming system Develop a system with colleagues for shared data 20

VERSION CONTROL TOOLS/STRATEGIES: Record file status/versions Record relationships between files e. g. data file

VERSION CONTROL TOOLS/STRATEGIES: Record file status/versions Record relationships between files e. g. data file and documentation; similar data files Keep track of file locations e. g. laptop vs. PC 21

VERSION CONTROL: SINGLE USER File naming; unique file name with date or version number

VERSION CONTROL: SINGLE USER File naming; unique file name with date or version number File name Changes to file Interviewschedule_1. 0 Original document Interviewschedule_1. 1 Minor revisions made Interviewschedule_1. 2 Further minor revisions Interviewschedule_2. 0 Substantive changes Version control table or file history alongside data file Version control facility within software e. g. Microsoft Word 2003 22

VERSION CONTROL: MULTIPLE USERS Control rights to file editing: read/write permissions e. g. Microsoft

VERSION CONTROL: MULTIPLE USERS Control rights to file editing: read/write permissions e. g. Microsoft Office Versioning/file sharing software: e. g. Google Drive, Amazon S 3 Merging of multiple entries/edits 23

VERSION CONTROL: MULTIPLE LOCATIONS Synchronise files e. g. MS Sync. Toy software, Drop. Box

VERSION CONTROL: MULTIPLE LOCATIONS Synchronise files e. g. MS Sync. Toy software, Drop. Box Use remote desktop 24

ENCRYPTION: PERSONAL OR SENSITIVE DATA Encrypt anything you would not send on a postcard

ENCRYPTION: PERSONAL OR SENSITIVE DATA Encrypt anything you would not send on a postcard for moving files e. g. transcripts for storing files e. g. shared areas, mobile devices Free softwares that are easy to use: Safehouse Truecrypt Axcrypt These softwares: encrypt hard drives, partitions, files and folders encrypt portable storage devices e. g. USB flash drives 25

DATA DESTRUCTION When you delete data and documentation from a hard drive, it is

DATA DESTRUCTION When you delete data and documentation from a hard drive, it is probably not gone: Files need to be overwritten to ensure they are irretrievably deleted: � BCWipe - uses ‘military-grade procedures to surgically remove all traces of any file’ If in doubt, physically destroy the drive using an approved secure destruction facility Physically destroy portable media, as you would 26 shred paper Image used with permission from the UKDA

DATA SECURITY Protect data from unauthorised access, use, change, disclosure and destruction Personal data

DATA SECURITY Protect data from unauthorised access, use, change, disclosure and destruction Personal data need more protection – always keep separate Control access to computers passwords anti-virus and firewall protection, power surge protection networked vs non-networked PCs all devices: desktops, laptops, memory sticks, mobile devices all locations: work, home, travel restrict access to sensitive materials e. g. consent forms, patient records Proper disposal of equipment (and data) even reformatting the hard drive is not sufficient Control physical access to buildings, rooms, cabinets 27

SUMMARY In this section, we have looked at: Data storage Back-up File naming Versioning

SUMMARY In this section, we have looked at: Data storage Back-up File naming Versioning Data security, encryption and destruction 28

GAME Get into two teams Choose a team name and a sound effect that

GAME Get into two teams Choose a team name and a sound effect that your team will make when you have the correct answer You may not confer with your team mates after making your sound effect Do not answer the question until I ask for the answer I will record the number of correct answers on the flipchart Any questions? 29

Presentation Exeter 27. 11. doc 30

Presentation Exeter 27. 11. doc 30

CONGRATULATION S!! 31

CONGRATULATION S!! 31

FURTHER INFORMATION Uo. E Code of Good Practice in the Conduct of Research Truecrypt

FURTHER INFORMATION Uo. E Code of Good Practice in the Conduct of Research Truecrypt encryption Data back-up Storage Ethical approval Data security and destruction guidance from Information Security External Back-up advice from UK Data Archive Advice on organising files from Cambridge University Library. UKDA checksum exercise 32

DATA SHARING 33

DATA SHARING 33

TWO STAGES OF DATA SHARING Two stages of your project when you may share

TWO STAGES OF DATA SHARING Two stages of your project when you may share data “Live” sharing during the project Making your “completed” data available at the end of your project Different issues and ways of sharing data during these stages 34

WHY SHOULD YOU SHARE YOUR DATA Benefits – “Live” data � Increased collaboration opportunities

WHY SHOULD YOU SHARE YOUR DATA Benefits – “Live” data � Increased collaboration opportunities with colleagues � Increased exposure of your current work � Increased efficiency across research group Benefits – “Completed” data � Increased citation counts � Increased exposure for your work � Increased chance of collaboration in the future � Allows others to build on your research Policy � RCUK Common Principles on Data Policy � University Policy 35

GROUP EXERCISE ONE (10 MINS) Thinking of the data you have shared: What are

GROUP EXERCISE ONE (10 MINS) Thinking of the data you have shared: What are the pros and cons of the different methods you have used? What issues did you face when sharing your data? Why haven’t you shared data? Feedback to the group. 36

HOW TO SHARE YOUR DATA – DURING YOUR RESEARCH With your supervisor; with project

HOW TO SHARE YOUR DATA – DURING YOUR RESEARCH With your supervisor; with project colleagues; with external interested parties Cloud Storage – Dropbox, Googledrive, Skydrive etc. Not recommended for sensitive or personal data. Email – issues with large data and/or sensitive data. Potential version control problems. USB sticks – easily lost. Can transfer viruses. External hard drives – less suitable if collaborator is at a different institution. Websites – lack of permanency. Need internet connection. May not have access rights to the site. FTP – Not secure. Data can be intercepted. Hard copy documents – one of a kind. 37

HOW TO SHARE YOUR DATA – AT THE END OF YOUR RESEARCH Archive Repositories

HOW TO SHARE YOUR DATA – AT THE END OF YOUR RESEARCH Archive Repositories Discipline specific archive Archaeology Data Service (Inter)national UKDS University archive repository Link your data with your thesis/research papers Websites Link from your University personal webspace to data in a repository Link from academic network sites Academia. edu, Research. Gate. net 38

ISSUES IN DATA SHARING Ethical and Data Protection Act Copyright and legal issues File

ISSUES IN DATA SHARING Ethical and Data Protection Act Copyright and legal issues File size Metadata Discoverability of the data Re-use of data Documentation of data File format – open or proprietary What to share Quality control and versioning 39

ETHICAL AND DPA ISSUES Not all data can be shared. You must ensure that

ETHICAL AND DPA ISSUES Not all data can be shared. You must ensure that you don’t share data you are not allowed to: � Abide by your ethical approval � Abide by the Data Protection Act Are you sharing this data securely? Have you got consent to share the data? Use Cloud Storage wisely – not for sensitive data Getting consent: Advice from the UKDA Ethics advice from: College Ethics Officers (Exeter username and password needed) DPA Advice: recordsmanagement@exeter. ac. uk 40

COPYRIGHT AND LEGAL ISSUES You must abide by any contract you or your project

COPYRIGHT AND LEGAL ISSUES You must abide by any contract you or your project group have signed: This may state that you are not allowed to share the data or it may include the conditions of sharing You must be aware of who owns the copyright for the data you are sharing: You may not be allowed to share it You must get permission from the copyright owner before sharing data Also applies to data in your thesis Advice from JISC Digital Media on using images 41

FILE SIZE Large files cannot be emailed Some files may not fit onto USB

FILE SIZE Large files cannot be emailed Some files may not fit onto USB sticks How do you know if a file has been received? Use the University’s File Drop Box (up to 600 MB) Large files can take a long time to upload to Cloud Storage 42

FILE FORMAT Is the file format you are using widely used? If not, can

FILE FORMAT Is the file format you are using widely used? If not, can you migrate it to a more widely used format? E. g. . xlsx (Excel); . pdf Is the format you are using an “open” format or is it “proprietary”? Open formats can be more easily accessed by other researchers e. g. SPSS files can be saved as. csv files. Word files can be saved as an Open Document format (. odt rather than. docx) Make sure you don’t lose important information when migrating formats See advice from the UKDA 43

EXAMPLE: FORMAT CONVERSION MS Excel format Loss of annotation Tab–delimited text format 44

EXAMPLE: FORMAT CONVERSION MS Excel format Loss of annotation Tab–delimited text format 44

METADATA Record the metadata as you collect/create your data Have you provided information about

METADATA Record the metadata as you collect/create your data Have you provided information about the data with the data you share? It is needed for discoverability, reuse, reproducibility and verification etc. E. g. � � � Author Title Date of creation Publisher Abstract Description of the data Tips from MIT and Cambridge or ask your Subject Librarian 45

EXAMPLE OF METADATA RECORD IN INSTITUTIONAL REPOSITORY 46

EXAMPLE OF METADATA RECORD IN INSTITUTIONAL REPOSITORY 46

SUPPORTING DOCUMENTATION Have you provided enough information for another researcher to be able to

SUPPORTING DOCUMENTATION Have you provided enough information for another researcher to be able to understand, retrieve, validate and re-use the data? � � � Where was the data created? How was the data created? What hardware and software were used? What methodologies were used? What assumptions did you make in your experiments? Why are there anomalies in your data? Along with the metadata, the documentation should enable the data to be understood and reusable independently of any other publications, data etc. Advice from the UKDA 47

WHAT TO SHARE? You don’t need to share all your “live” data Only data

WHAT TO SHARE? You don’t need to share all your “live” data Only data that is helpful and useful to the recipient What to archive? Consider policy/legal requirements In collaboration with your supervisor or PI develop a set of criteria: Only the data supporting your publications? Data that can reproduce your results? Data that can validate your results? How unique or significant is your data? Advice from Cambridge 48

QUALITY CONTROL AND VERSIONING If working on a collaborative project ensure that you are

QUALITY CONTROL AND VERSIONING If working on a collaborative project ensure that you are all working on the correct version of your data Use version control tables or name your document appropriately Will versioning affect how you will share your data? For example, is it easier to control versions with cloud storage than email? How will you ensure that the data isn’t corrupted or changed in the process of sharing? Archives/repositories can provide a Persistent Identifier 49

DATA DISCOVERABILITY Data needs to be found if it is to be re-used Discoverability

DATA DISCOVERABILITY Data needs to be found if it is to be re-used Discoverability can be aided by placing your data in a repository � Indexed by Google � Researchers know they exist and can go to them for data � Need adequate (and accurate) documentation to fully aid discoverability � Persistent Identifier – can be included in citations, emails, Tweets etc. � Academic networks and personal web pages can link to the data in a repository Reference your data in publications � See the new RCUK policy on Open Access 50

DATA RE-USE Data citation is becoming more common If others use your data it

DATA RE-USE Data citation is becoming more common If others use your data it can increase your citation rates. Sharing can mean that your data is re-used in areas you didn’t think it could be. Get credit for all your research E. g. ships’ logs are being used by climate scientists Prof. Tim Naylor on data sharing : ‘I have examples of people who could have simply lifted the data, gone away and done something with it and given me a citation for it; but actually they have come to me and said, “OK, I’ve got this data, which is yours, we’re interested in it, but we need your expertise to interpret it” and then I get a co-authorship out of it as well. ’ 51

OPEN ACCESS TO COMPLETED DATA You will be required to make your data available

OPEN ACCESS TO COMPLETED DATA You will be required to make your data available on Open Access where appropriate RCUK Common Principles on Data Wellcome Trust Policy Statement Uo. E policies Link publications and supporting data RCUK will require a statement in research papers from 1 st April 2013 saying where the supporting data can be accessed Archives and repositories 52

Further info/helpful links External: UKDA guidance on “Planning for Sharing” DCC table showing if

Further info/helpful links External: UKDA guidance on “Planning for Sharing” DCC table showing if your funder provides a data centre NIHR Research Governance Framework For Health and Social Care “Data relevant to findings should also be accessible. ” (p. 14) Internal: Open Exeter web pages Exeter University’s Institutional Repositories Ph. D student’s experience of copyright issues Cloud Storage guidance Contact rdm@exeter. ac. uk for help and advice 53

Open Access 54

Open Access 54

OPEN ACCESS What is it? International movement to open up access to research knowledge.

OPEN ACCESS What is it? International movement to open up access to research knowledge. Publicly-funded research should be openly and freely available when appropriate. No restrictions on access or use. Most funders now require funded research to be made OA. 55

THE BENEFITS OF OA ü ü ü ü Increased visibility of research & researchers.

THE BENEFITS OF OA ü ü ü ü Increased visibility of research & researchers. Impact: OA research cited more frequently. Research lifecycle can be accelerated: published, read, cited, built on. Facilitating collaboration & sharing. New business opportunities. Tool for the University to raise awareness of research profile. Public good: sharing scholarship and intellectual wealth. 56

HOW DOES OA WORK? Put a copy of your research paper in a repository

HOW DOES OA WORK? Put a copy of your research paper in a repository (the Green route – free to the researcher). Pay a publisher a fee to make your paper OA (the Gold route – c. £ 1, 700 average). Publish in a free OA journal. SHERPA/Ro. MEO: information on publisher OA policies. DOAJ: a directory of free OA journals. 57

FUNDER POSITION ON OA RCUK from 1 April 2013 – all papers submitted for

FUNDER POSITION ON OA RCUK from 1 April 2013 – all papers submitted for publication should be OA within 6 months (12 months for AHRC & ESRC). � When not a "feasible option“, RCUK expect the paper to be OA within 12 months or 24 months esp. AHRC/ESRC-funded research. � MRC-funded papers should always be OA within 6 months Wellcome/NIHR – published papers must be available on OA within 6 months and deposited in UKPMC. EU – after 6 -12 months. Most other funders currently ‘encourage’ or ‘support’ OA expect future mandates. 58

EXETER POSITION ON OA Academic freedom of choice is paramount. Green OA is to

EXETER POSITION ON OA Academic freedom of choice is paramount. Green OA is to be adopted as the cultural norm. An institutional mandate for Green OA is likely to be implemented in 2013 alongside policy. Gold may be more relevant in STEM/M where quick impact is often a factor. Funds should be distributed fairly so that no researcher group is disadvantaged (e. g. , ECRs). It will take time for all research to be compliant; Uo. E will provide support and guidance through the Library. 59

WHAT TYPES OF RESEARCH ARE AFFECTED? RCUK: peer-reviewed journal articles & published conference papers.

WHAT TYPES OF RESEARCH ARE AFFECTED? RCUK: peer-reviewed journal articles & published conference papers. Wellcome: peer-reviewed journal articles. Not Monographs, book chapters, etc. RCUK: You will need to state where and how data can be accessed in your research paper. 60

HOW TO GO GREEN Put a copy of your paper in the Uo. E

HOW TO GO GREEN Put a copy of your paper in the Uo. E or other repository (may need to be a post-print – always keep your own peer-reviewed copy). Deposit via Symplectic – in return you get a link to the full text. Wellcome-funded researchers must put a copy in PMC/UKPMC within six months. Publish in a free Open Access journal: DOAJ Repository deposit is not a means of publishing, it is a means of being OA compliant. 61

HOW TO GO GOLD Many publishers operate a paid (Gold) OA scheme – your

HOW TO GO GOLD Many publishers operate a paid (Gold) OA scheme – your paper is made openly and freely available on payment of a fee. If you have the funds to pay for Gold: Check in advance that the journal in question has a paid OA option (use SHERPA/Ro. MEO). If your chosen journal does not, you may be able to negotiate a one-off payment or a more lenient copyright agreement. 62

HOW WILL THE COSTS OFGOLD OA BE MET? Uo. E will receive a block

HOW WILL THE COSTS OFGOLD OA BE MET? Uo. E will receive a block grant of £ 215 k from RCUK from April 2013 -March 2014. Uo. E has £ 131 k from the Government via BIS (to be spent by April 2013). Uo. E has £ 50 k from Wellcome. Uo. E has a number of membership schemes (discounts with PLo. S, Royal Society, Bio. Med Central). Probably Bio. Med Central prepay scheme soon. Methods of allocating funds are still under discussion. NB You can no longer factor the costs of OA publishing into Wellcome funding bids and, from April 2013, RCUK funding bids. 63

OA GUIDANCE The choice of where to publish is an academic decision. We will

OA GUIDANCE The choice of where to publish is an academic decision. We will help researchers navigate publisher policy and support academic choice. We will help researchers deposit via the Green or Gold route subject to funds. Funders do check institutional compliance – be aware. Any queries: openaccess@exeter. ac. uk 64

HELPFUL LINKS Contact us: openaccess@exeter. ac. uk Open Access web site RCUK policy Uo.

HELPFUL LINKS Contact us: openaccess@exeter. ac. uk Open Access web site RCUK policy Uo. E policies 65

ANY QUESTIONS? 66

ANY QUESTIONS? 66