Andrew Treloar ARCHER Project Director Cathrine HarboeRee University
Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan Mc. Meekin, Executive Director ITS Dancing with data down under CNI Winter 2007 Project Briefing www. monash. edu. au
O is for Overview • • Drivers for what we are presenting Research case study overview Challenges and solutions Australian national developments www. monash. edu. au 2
D is for Drivers www. monash. edu. au 3
D: Monash – a distinctive and internationalised university • Established 1960 • Research intensive, doctoral granting • 55, 000 students from more than 100 countries • 6. 1% of student load is graduate • 3, 500 academic staff (6, 800 total EFT staff) • 10 faculties • Campuses in Australia (six), Malaysia, South Africa, centre in Prato • Partnerships – India, Hong Kong, Singapore, China • Total research income $186 mill. (2006) www. monash. edu. au 4
D: Information Management Strategy • 2 year initiative to develop an overarching strategy for the whole university • Took holistic view of information • Informed by views of range of information management professionals and stakeholders • Report available at: www. monash. edu. au/staff/information-management/ • Based on set of ten principles that have been extended into the research data domain www. monash. edu. au 5
D: Monash data management environment • High level support – DVC (Research), Prof Edwina Cornish – Establishment of E-Research Centre • Need to manage growing deluge – Leading E-researchers in some disciplines – Synchrotron (1 TB per day) – Shoah Archives (12 TB) – And others • Need to respond to Australian Code for the Responsible Conduct of Research – www. nhmrc. gov. au/publications/synopses/r 39 syn. htm www. monash. edu. au 6
D: Three inter-related national projects Virtual Learning Environment ARROW Undergraduate Students Digital Library E-Researchers DART Peer. Reviewed Journal & Conference Papers E-Researchers E- Reprints Technical Reports ARCHER Grid Preprints & Metadata E-Experimentation Publisher Holdings Graduate Students Local Institutional Web Archive Certified Experimental Results & Analyses Data, Metadata & Ontologies 5 Entire E-Research Life. Cycle Encompassing experimentation, analysis, publication, www. monash. edu. au research, learning 7 Source: Adapted from Liz Lyon, e. Bank UK Presentation
R is for Research case study www. monash. edu. au 8
R: Structure determines function Sequence Structure Function Unfolded protein is chain of amino acids Folded protein Function depends on protein shape • Highly mobile • Inactive • Precise shape • Stable • Highly ordered • Active • Specific associations • Precise reactions www. monash. edu. au 9
R: Flow of biological Information www. monash. edu. au 10
R: How to solve a structure Diffraction intensities + Phases Fourier synthesis Electron density Experimental methods Use known structures (molecular = back to lab replacement) 3 D structure www. monash. edu. au 11
R: Resulting publication in Science www. monash. edu. au 12
www. monash. edu. au 13
R: Access Statistics: 23/8/2007 to 1/12/2007 • Views: 918 total – 257 from library staff – 152 from other Monash addresses – 509 from non-Monash addresses • Downloads: 498 total – 87 from library staff – 62 from other Monash addresses – 349 from non-Monash addresses www. monash. edu. au 14
R: Why he cares about data • • Raw data are sacred Data validation for reviewers and by peers His data are now safe and secure Store of examples for those doing methods development • Some data cannot be processed by him; why not let others have a go? www. monash. edu. au 15
C is for Challenges and Solutions • • • Laboratory data management practice Institutional data management planning Sustainable storage provision Data curation across data stores Data in institutional repositories www. monash. edu. au 16
C: Laboratory data management practice • Challenge – Infrequent and deficient backup – No commitment to long-term preservation – Poor recording of metadata (descriptive/provenance) • Solution – Embed IM professionals with research teams – Provide sustainable storage for backup – Improve laboratory data capture systems www. monash. edu. au 17
C: Institutional data management planning • Challenge – No systematic organisation-wide approach – No way of engaging with researchers www. monash. edu. au 18
S: Institutional forum to discuss issues • Membership – Library – ITS – Records and Archives – Research Office – e-Research Centre • Outputs – Policy and Plan (print trial, web production) – Outreach activities www. monash. edu. au 19
S: Data Management Plan – objectives • Assists both researcher and institution • Is completed at beginning of research project, updated as necessary – May become mandatory in future • Captures some technical, access and descriptive metadata at the beginning of research project • Is not onerous • Delivers visible benefits • Assists in providing complete research data solutions www. monash. edu. au 20
S: Data Management Plan – components • • • Originators and owners of the data Description of project Metadata used (schema, standards) Types of data to be collected Volume of data (initial estimate) Retention requirements (guidelines provided) Format/s of and software used in creation and use of the data Access policies and provisions IP constraints Confidentiality requirements Storage, preservation and archiving of data www. monash. edu. au 21
C: Sustainable storage provision • Challenge – Need sustainable way to provide large (terabyte) amounts of storage for researchers – Make this more financially attractive than JBOD under desk • Solution – Large Research Data Storage (La. RDS) www. monash. edu. au 22
C: La. RDS requirements • Addresses institutional and researcher needs • Formulates a set of principles to guide cost modelling and sustainable funding options • Assumes commitment to storage in perpetuity – or “as long as required”, whichever comes first ; -) • Adopts a central storage model … – Centrally funded basic allowance, plus – Directly charged excess allowance • … in parallel with decentralised storage • 700 TB and growing www. monash. edu. au 23
C: Different stores for different domains www. monash. edu. au 24
C: Data in institutional repositories • Challenge – Most IRs are designed for document objects – Many data objects are large > 2 QP 2 produced 36 GB of image data – HTTP download metaphor doesn’t scale • Solution – Trialling both managed content and externally referenced content at present – Investigating custom disseminators on server www. monash. edu. au 25
A is for Australian national developments www. monash. edu. au 26
A: Australian e-Research Infrastructure • Term ≈ Cyberinfrastructure • National Collaborative Research Infrastructure Strategy (A$555 M, 5 yrs) – 15 research capabilities – and Platforms for Collaboration • Platforms for Collaboration (A$75 M, 4. 5 yrs) – National Computation Infrastructure – Interoperation and Collaboration Infrastructure – Australian National Data Services www. monash. edu. au 27
A: Australian National Data Service • Monash University is leading a project to establish ANDS • ANU and CSIRO to be other members of collaborative partnership • Tasks to be distributed more widely • Four platforms: – Frameworks (policy) – Utilities – Repositories – Researcher Practice • http: //www. pfc. org. au/twiki/bin/view/Main/Data www. monash. edu. au 28
Q is for Questions! • andrew. treloar@its. monash. edu. au • cathrine. harboe-ree@lib. monash. edu. au • alan. mcmeekin@its. monash. edu. au • http: //arrow. edu. au/ • http: //dart. edu. au/ • http: //archer. edu. au/ * Thanks to Dr Ashley Buckle and colleagues at Monash for the use of the protein crystallography slides and movies www. monash. edu. au 29
Federating Data • The Australian Repository for Diffraction Image. S – http: //www. tardis. edu. au/ • National activity to support communities of protein crystallographers • Ideal place to hook into the e. Crystals Federation – http: //wiki. ecrystals. chem. soton. ac. uk/ www. monash. edu. au 30
- Slides: 30