MANGING YOUR RESEARCH CREATING A DATA MANAGEMENT PLAN





























- Slides: 29
MANGING YOUR RESEARCH: CREATING A DATA MANAGEMENT PLAN By Nina Lewin Data Services Librarian 2012
This session introduces standard procedures for creating a data management plan and generating metadata 1. Data management is: the set of processes between data collection and results 1. Why do I need to do this ? Verification , reliability and “ I am sure I did that analysis” The most common mistake is to focus on the interesting part i. e. the research question and forget about the practical part
Data Management Plans : Formats for Data collection, organization and preservation Document planning for how the data will be managed as you are collecting it 1. It a formalized way of ensuring that your research is organized in such a way that your data backs up your results 2. Its require by funders and it helps you meet ethics requirements: 1. For example ethics will ask to you consider how you intent to protect your data’s confidentiality , handle it during processing and if you will save it or destroy it when you are done. 3. Its good research practice 1. These are standard approaches in science BECAUSE they stop you making mistakes in analysis But MOST OF ALL It stops you losing your data and It lets you find it again
Data about data : The circle of research life Data management plans generate metadata – Metadata is the tool used to manage data: organizationally and intellectually • Metadata contains – process information: where , when and what you did – Summary information: who and what – Reflective information: analysis choices , memos , field notes
IF you only do one thing • • • What you can do while the kettle boils : a list The most important metadata is a name Project title: e. g. biochemistry 1 Interviewee’s number, name, initials or pseudonym Date of interview Role type Male or Female interviewee Mode of interview NB If you want features to appear in the Output, they must appear in the file name • See hsrc metadata http: //manual. recoup. educ. cam. ac. uk 5
Data Management Plans are a process that ensures the accuracy and quality of the data you are presenting The GOLD STANDARD IS TO ENSURE THAT: 1. The collected data is correctly and accurately moved from raw into captured into final form 1. No mistakes in capturing , moving or uploading 2. No errors in analysis from data quality
Data Management Plans are a process that ensures the accuracy and quality of the data you are presenting The GOLD STANDARD IS TO ENSURE THAT: 1. The collected data is correctly and accurately moved from raw into captured into final form 1. No mistakes in capturing , moving or uploading 2. No errors in analysis from data quality
Ethics You need a data management plan because you need to show that you can manage the data from collection to preservation: scientific validity Doing research just because you need a degree is not good enough Two issues: management and preservation
Ethics: privacy HERC • Can confidentiality be guaranteed? • Can anonymity be guaranteed in resulting reports, theses and/or publications? Yes No • Explain how this will be done? What will participants be told in this regard? • All data however should be kept confidential and safe from unauthorised access once it has been collected. Informants should have the right to remain anonymous in the final report, and this must be respected in handling of all data relating to them. – This includes everything from collection to generating results
Ethics SO Think before you do an informed consent: “All data way that should be preserved in a waythat respects the nature of the original participants’ consent. “ for use and for reuse , sharing and merging 1. Ethics sit with the data not the researcher 1. Who has access to the data ? Remember , examiners supervisors, consultations , presentations COUNT 2. Is the data identifiable : privacy, confidentiality or anonymity, 1. If it not how do you prove you collected it?
Ethics 'Raw' or unprocessed data, especially where the identity or personal data of research participants is included, must be safeguarded and preserved from unauthorised access. That means that you need to ensure that no one can get to the data ; regardless of how likely it is Data may be destroyed after use, This is great option but it has some problems
Ethics You want to collect it, you might have to keep it Preservation guidelines: “preservation in an archive or personal collection may also be appropriate, desirable or even essential. ” Data sets that contain historically important information or information that relates to national heritage must be preserved and should be placed in a public archive where possible and appropriate.
Ethics: So Think before you propose to : “ the files will be kept in a lock drawer” and the data will be on my ipad which I leave out all the time and is on dropbox in an open folder Consider passwords encryption or limited access “ I will destroy the data at completion of the project” What counts as completion? End of data collection? Or end of marking? And can you prove that you did erase all the files? ? ? And do you even know how many copies you have ?
Data Management Plans • We propose that students be introduced to data management via a component in their proposals in the standard two page format. • DDI (Data Documentation Initiative): Multiple standards , we have chosen to go with HSRC approach because its on South African data and they have piloted first These detail: – – – – – The types of data, Organising file names format processing versions backup data analysis and quality preservation standards and checks metadata content archiving and preservation access to completed data or termination and disposal
• Exercise : University of Massachusetts Amherst • Data Management Plan Template • • Project Overview: Authors/Investigators: Proposal/Project Name (reference to main proposal): Narrative description of data to be created/used
• • Data Description: Type (observational, computational, experimental): Size of data sets: How is the data created/acquired: What format/s are the data in (what are the file types being used): What are the required software/facilities/equipment/hardware to access and analyze the data: Describe any metadata or standards that will be applied to this data: What procedural documentation exists for the creation/management of data:
• • Data Storage: What systematic backup procedures, including systems description, are in place for the data: How long will data be stored: What security mechanisms are required for the data and how will they be implemented:
• • • Access and Dissemination: Who owns the data (intellectual property rights information): Are there privacy or confidentiality constraints on the data being used/generated: Who will have access to the data during the project/after the project: What data will be shared (raw/derived /published): How will you provide access to the data: what software will be required to access the data what metadata will be used to discover the data how will technical delivery of data be implemented
Really boring and REALLY essential : NAME YOUR FILES • Files management systems : Intuitive logical and documented ( metadata) • Create a system of folders upfront Keep GIS data within the program folders Decide on a convention: Will you date a file per day or per topic , when will you start a new folder? • • – – Remember if a file is in a software you don’t have a license to, you LOST IT Check the licenses or use open software, always save in two formats for storage STICK TO IT And Clean it up when you didn't stick to it
• ASK YOUR DEPARMENT SUPERVISOR AND STAKEHOLDERS • Are you planning or expecting to reuse this data ? – Qualitative data textual : e. Xtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema – (. xml) Rich Text Format (. rtf) plain text data, ASCII (. txt) – Digital image data: TIFF version 6 uncompressed (. tif) – Digital audio data Free Lossless Audio Codec (FLAC) (. flac) – Digital video data MPEG-4 (. mp 4) motion JPEG 2000 (. jp 2) – Documentation Rich Text Format (. rtf) PDF/A or PDF (. pdf) – Open. Document Text (. odt) • FILE FORMATS CURRENTLY RECOMMENDED BY THE UK DATA ARCHIVE FOR LONG-TERM PRESERVATION OF RESEARCH DATA
Example: http: //www. data-archive. ac. uk/create-manage/format/organising-data They suggest relative fewer folders
• Backup and Storage • • Backup is what you do when you save normally Storage is the process that you have in place in case the backup fails – – – • Your computer’s hard drive died because your drunken roommate spilled beer all over it Your laptop got stolen You saved it to a drive, and you lost the drive You were working on a departmental computer and it got a virus which mean everything needs to be wiped out. You saved it but you can’t find it now because you don’t know where the program saves things too Take your proposed research and answer the following questions thinking about storage and backup – How are you going to capture the data – Where are you going to store the data
Saving 1. Storage Save a copy of your working document EVERY WEEK to another drive in ANOTHER place 1. If your drive or your cd is in the same place as your computer that not a backup 2. The safest place to backup to is the internet or a cloud • Google docs: limitless – • Uploading time consuming Drop box : http: //www. dropbox. com 2 gig – Drop box is a sync system. What you put in drop box is simultaneously on drop box online • BUT remember someone owns the cloud and they can close or be off line so the university has an option • Master or Ph. D students if you are writing a thesis the library will help you store
user id: password: http: //elearn. wits. ac. za • File share : like drop box it create a drive on your computer called Z: – The My Workspace also called SAKAI 1 GB per site so be selective Username : Student number Password: normal password Create a account upload under resources
Sourcing and Citing data THOSE WHO GIVE DECIDE ON WHO GET THE CREDIT so Militarized Interstate Dispute (MID) Data, 1816 -2001 [Computer file]. ICPSR 24386 -v 1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2010 -03 -05. doi: 10. 3886/ICPSR 24386. v 1 Citing data: APA 6 th edition Data Sets: Simmons Market Research Bureau. (2000). Simmons national consumer survey [Data file]. New York, NY: Author.
• Programs : Max atlas-tu Nvivo • All of these programs could use more spreadsheet attachments, max-qda has some attributes • The program becomes a filing system , a metadata generator and an project outcome in and of its self. – – There is no doubt you can replicate this in a file folder system BUT you can’t get date time logs, author logs and metadata
• Programs : Mendeley, endnote, refman refworks • • You can upload your files as pdfs or cut and paste The program organizes your references , notes, data and analysis in one place BUT NOT ANY stats or spreadsheets or gis • • You cant do this yourself you WILL make a mistake with references and you will lose track of your out puts
• References • Tools of the Trade training Digital Curation Centre http: //www. dcc. ac. uk/training/train-the-trainer/dc-101 -lite-materials I’ve collected my data, so what do I do with it now? Research data management • DATUM for Health www. northumbria. ac. uk/datum • HSRC DEPOST FORM And UK DATA ARCHIVE http: //www. dataarchive. ac. uk/create-manage/formats-table • Richard R. Plant “ Guidance Notes for Completing a “Checklist for a Data Management Plan v 3. 0” For Researchers in the Psychological Sciences” 201107 -22 in guidance booklet DCC’s Checklist for a Data Management Plan. •