Digital Curation Centre a centre of support for

  • Slides: 30
Download presentation
Digital Curation Centre a centre of support for data curation and preservation UK Digital

Digital Curation Centre a centre of support for data curation and preservation UK Digital Curation Centre An Introduction Dr Liz Lyon, Associate Director Outreach Grand Challenge Meeting, Bath June 2005

Repositories and digital curation For later use? Static Data preservation In use now (and

Repositories and digital curation For later use? Static Data preservation In use now (and the future)? Dynamic Data curation “maintaining and adding value to a trusted body of digital information for current and future use” 2

Assuring permanent access to the records of science & the humanities? Long term access

Assuring permanent access to the records of science & the humanities? Long term access to primary data • Increasing data volumes from e. Science and Grid-enabled / cyberinfrastructure applications • Changing research paradigm: data-driven science, “big science” • Observational data, simulations, large-scale experimentation • Multi-media resources, statistical data, surveys, geo-spatial data…… 3

4

4

Facilitate “post-processing” and knowledge extraction Enable the acquisition of newly-derived information and knowledge •

Facilitate “post-processing” and knowledge extraction Enable the acquisition of newly-derived information and knowledge • Run complex algorithms over primary datasets • Mining (data, text, structures) • Modelling (economic, climate, mathematical, biological) • Analysis (statistical, lexical, pattern matching, gene) • Presentation (visualisation, rendering) 5

6

6

Provide additional functionality beyond digital preservation processes Annotations • Gene and protein sequences •

Provide additional functionality beyond digital preservation processes Annotations • Gene and protein sequences • e-Lab books (Smart Tea Project in chemistry) 7

Presentation services: subject, media-specific, data, commercial portals Data creation / capture / gathering: laboratory

Presentation services: subject, media-specific, data, commercial portals Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Resource discovery, linking, embedding Data analysis, transformation, mining, modelling Searching , harvesting, embedding Aggregator services: national, commercial Harvesting metadata Research & e-Science workflows Validation Deposit / selfarchiving The scholarly knowledge cycle : linking research data to publications e. Bank UK Project http: //www. ukoln. ac. uk/projects/ebank-uk/ Repositories : institutional, e-prints, subject, data, learning objects Validation Publication Linking 8 Data curation: databases & databanks Peer-reviewed publications: journals, conference proceedings Emerging policy on open access to data

DCC people (some of them…) • Management & Co-ordination – Director Chris Rusbridge (University

DCC people (some of them…) • Management & Co-ordination – Director Chris Rusbridge (University of Edinburgh) • Community Support & Outreach – Led by Dr Liz Lyon (UKOLN, University of Bath) • Service Definition & Delivery – Led by Professor Seamus Ross (HATII [ERPANET], University of Glasgow) • Development – Led by Dr David Giaretta (Astronomical Software & Services, CCLRC) • Research 9 – Led by Professor Peter Buneman (Informatics, University of Edinburgh)

(Some of) the challenges we face Standards: Interoperability issues: technical & ? ? soluble

(Some of) the challenges we face Standards: Interoperability issues: technical & ? ? soluble Scale: Volume and diversity of datasets Culture: Bringing communities together • Library/information science/archives “document tradition” • Domain research (chemists, astronomers, biologists) • Computer science (databases) • Commercial suppliers (storage technology) Process & Skills: Highly-distributed organisation 10 • Use collaborative tools, combined skills Engagement: Existing work & key players

User requirements analysis: some sound bytes… R&D issues: Annotation services, Ontology development, Automating metadata

User requirements analysis: some sound bytes… R&D issues: Annotation services, Ontology development, Automating metadata creation, Tools and toolkits, Data Format Description Language, Identifiers, Registries, Economic and cost-benefits studies Advisory services : “Ask-a-Curator”, FAQs, reports, briefings, awareness -raising materials, best practice guidance, Storage media, “Like Erpanet”, advise Government, Research Councils, funding bodies Professional development: Short courses, conferences, seminars, workshops, secondments to DCC and to working repository services Outreach: Leadership for the future, case studies, sharing solutions, collaboration with other partners, international peers, industry links Taxonomy of “Users” 11

Outline Taxonomy of digital curation users by role Data Preservers 4. Policy makers 2.

Outline Taxonomy of digital curation users by role Data Preservers 4. Policy makers 2. Data Curators -funding bodies 1. Data Creators 12 Data -other leaders publishers 3. Data Re-users

Outline Taxonomy by significant function of organisational entity 1. 4. Funders 5. Policy /

Outline Taxonomy by significant function of organisational entity 1. 4. Funders 5. Policy / strategy makers Research 3. Learning & teaching 2. Service provision Commercial “Designated communities” 13

Advisory services 14 • Responses to queries—from legal to technical guidance HELPDESK@dcc. ac. uk

Advisory services 14 • Responses to queries—from legal to technical guidance HELPDESK@dcc. ac. uk • FAQs constructed • Informing workshops and information services • Monthly site visits (National Institute of Environmental e. Science)

Professional development workshops • 2005 Programme – Persistent identifiers June, Glasgow – Institutional repositories:

Professional development workshops • 2005 Programme – Persistent identifiers June, Glasgow – Institutional repositories: July University of Cambridge, with DSpace – Cost models July British Library, London with the Digital Preservation Coalition – Preservation of medical databases: October Gulbenkian Institute, Lisbon with ERPANET & the Wellcome Trust 15

Standards Watch 16 • Covering existing and emerging standards • Working with community and

Standards Watch 16 • Covering existing and emerging standards • Working with community and standards bodies (e. g. ISO) • Organising associates groups around new standards developments • Initiating standardisation definitions where gaps identified • Currently re-purposing Diffuse database of standards materials

Digital Curation Manual • A world class resource • Constructed from topic-specific chapters –

Digital Curation Manual • A world class resource • Constructed from topic-specific chapters – written by international experts – editorial board comprising leading researchers and practitioners • 45 initial topics including – Appraisal and Selection; Costs; Freedom of Information; Interoperability; the OAIS Reference Model; Preservation Strategies; and Open Source 17 • Less in-depth insight offered by DCC Briefing Papers, aimed at needs of senior managers

OAIS Reference Model – Functional Model 18

OAIS Reference Model – Functional Model 18

Audit and Certification (1) • How can people know who to entrust with their

Audit and Certification (1) • How can people know who to entrust with their information? • There is a demand for a certification process for – Repositories and components e. g. archive storage – Software • Certification standards (ISO 9000 and ISO 17799) do not do the job • OCLC/RLG Trusted Digital Repositories: Attributes and Responsibilities 19 – high level model for design, delivery and maintenance of digital repositories

Audit and Certification (2) • International expert group led by RLG and NARA is

Audit and Certification (2) • International expert group led by RLG and NARA is drafting a Certification standard • DCC is participating: aiming for international consensus • Draft goes to Technical Editor end of June • DCC testbeds to support development of audit and certification standards • Commitment to 20 – offer guidance on self-audit and self-certification – carry out independent audits – issue certificates to qualifying repositories

Tools and Technologies • Accumulate and Maintain Registry and online Repository of relevant tools

Tools and Technologies • Accumulate and Maintain Registry and online Repository of relevant tools – Repository Implementations – Packaging Tools – Rendering Software – Format Converters – Device Drivers 21

Representation Registry Development info – see development • Simple PHP prototype • Scoping study

Representation Registry Development info – see development • Simple PHP prototype • Scoping study http: //dev. dcc. ac. uk for details of Wiki and email list open to all – Formats, standards, tools • More robust prototype in development – Based on eb. XML & JAXR – Potentially distributed, cooperative maintenance model – Representation information: describe CCLRC (science) data using EAST, 22 • Links to PRONOM, GDFR and other pilots • Aim to handover to services

Research agenda (1) • • Publishing & integrating scientific databases ‘Archiving’ past states of

Research agenda (1) • • Publishing & integrating scientific databases ‘Archiving’ past states of volatile databases Database provenance and annotation Organisational dynamics of trusted repositories • Automating metadata extraction • Cost-benefit analysis of data curation • Rights and responsibilities 23

The database picture 24 Source data Curated data: classified, cleaned, annotated, integrated, cross-linked

The database picture 24 Source data Curated data: classified, cleaned, annotated, integrated, cross-linked

Curated databases – some issues 25 • Integrating, publishing and citing data so that

Curated databases – some issues 25 • Integrating, publishing and citing data so that someone else can use it. • Annotating existing data and moving annotations to other databases • Provenance: where did this data come from? • Archiving: how do you preserve something that is constantly changing?

Research agenda (2) • • Publishing & integrating scientific databases ‘Archiving’ past states of

Research agenda (2) • • Publishing & integrating scientific databases ‘Archiving’ past states of volatile databases Database provenance and annotation Organisational dynamics of trusted repositories • Automating metadata extraction • Cost-benefit analysis of data curation • Rights and responsibilities 26 – “Public domain, public interest, public funding” paper Waelde & Mc. Ginley

27 www. dcc. ac. uk

27 www. dcc. ac. uk

 • www. ijdc. net • Launch planned July • Peer-review Editorial Board •

• www. ijdc. net • Launch planned July • Peer-review Editorial Board • Peter Buneman Editor (research) • Production editor Philip Hunter 28 • Papers for submission are very welcome!

1 st DCC International Conference • Location - Bath UK • 29 -30 September

1 st DCC International Conference • Location - Bath UK • 29 -30 September 2005 • Keynote speakers Ø Clifford Lynch CNI Ø Graham Cameron European Bio-informatics Institute • DCC Research update 29 • Social highlights

Associates Network Goals Develop understanding, share best practice, advance research, promote recognition, develop consensus

Associates Network Goals Develop understanding, share best practice, advance research, promote recognition, develop consensus Membership International groups, national bodies, industry partners, funders, research groups, HEIs, FEIs, individuals…… Benefits Early access to R&D outputs, advisory services, training, input to definition and design, community participation 30 Discussion Forum www. dcc. ac. uk Please join us!