The NCore Platform An OpenSource Suite of Tools

  • Slides: 61
Download presentation
The NCore Platform: An Open-Source Suite of Tools and Services for Implementing Digital Libraries

The NCore Platform: An Open-Source Suite of Tools and Services for Implementing Digital Libraries Dean B. Krafft Cornell University April 1, 2008 Open Repositories 2008

Presentation Overview What is NCore? Data Model and Architecture Services and Management Tools End

Presentation Overview What is NCore? Data Model and Architecture Services and Management Tools End User Tools • Expert Voices – blogging • NSDL Wiki • On. Ramp – content management Discussion Open Repositories 2008

NCore: The Technical Vision • Provide support for communities across a broad range of

NCore: The Technical Vision • Provide support for communities across a broad range of disciplines, education levels, and degrees of engagement • Enable the library as a shared, collaborative, contributory space • Support the creation and display of context around library resources to enhance discovery, use, and understanding • Put the library in the path of the user, enabling them to easily and comfortably integrate the library into their normal workflow • Release NCore as a platform for developing digital library tools Open Repositories 2008

What is NCore for NSDL? • A digital library that contains: – References to

What is NCore for NSDL? • A digital library that contains: – References to STEM resources – Metadata that describe those resources – Ways to organize, interrelate, and annotate resources • Back-end tools and services to support the creation, organization, and indexing of resource references and metadata in the library • End-user tools that allow: – Discovering, creating, and organizing library resources and metadata – Creating context, relationships, and annotations for the materials in the library Open Repositories 2008

Institution-Specific Services & Interfaces NSDL e. Learning Platform • Common Service Layer • •

Institution-Specific Services & Interfaces NSDL e. Learning Platform • Common Service Layer • • • Collection Tools Web 2. 0 Tools Strand Map Service Open APIs, highly customizable NDR + Fedora • • Research-based, NSFsupported Open Source Growing Fedora community Lightweight, common middleware for integrating content and services Portals Learning Environments Repositories Strand Map Service Collection Management System Discovery Service Word Press Mash-ups Media Wiki NDRAPI NDR Fedora: Native Interface Represented in NDR Rich Descriptions of Learning Goals NSDL Collections Publisher. Provided Content Institution. Specific Content User. Contributed Content

NCore in Operation • NSDL Data Repository contains 5 million digital objects: 3 million

NCore in Operation • NSDL Data Repository contains 5 million digital objects: 3 million resources and 2. 3 million metadata records • 150, 000 metadata/resource record updates are harvested from over 130 collections each month • Production services run on 9 Dell Red. Hat Linux servers, including 3 repository servers and 2 real-time replicating followers • Search service regularly crawls and indexes 700, 000 STEM resource pages • Search service handles approximately 5 million queries/month • Expert Voices blogosphere currently contains 46 blogs and 1524 posts with 749 registered users Open Repositories 2008

Specializing Fedora • Multiple Object Types: – Resources (with local or remote content) –

Specializing Fedora • Multiple Object Types: – Resources (with local or remote content) – Metadata – Aggregations (collections) – Metadata Providers (branding) – Agents • Relationships with arbitrary graph queries: – Structural (part of) – Annotation (relates to) Open Repositories 2008

NCore: Production enhancements to Fedora • MPTStore – Original Kowari RDF triple-store could not

NCore: Production enhancements to Fedora • MPTStore – Original Kowari RDF triple-store could not scale to 250 million dynamically updated triples • Transaction journaling: Recovery model of rebuilding from Fedora XML files (Foxml) did not scale to 5 million digital objects Open Repositories 2008

The Stacks (Repository) The NSDL Data Repository (NDR), implemented as set of digital objects

The Stacks (Repository) The NSDL Data Repository (NDR), implemented as set of digital objects and relationships in a Fedora repository Selector for Agent Metadata Provider Metadata Selector for Collection Metadata for Member of Metadata for Resource Collection Agent Related Resource Open Repositories 2008 Related Member of Resource

Back-end tools NDR API Ingest A REST-based web services interface to the NDR OAI-PMH

Back-end tools NDR API Ingest A REST-based web services interface to the NDR OAI-PMH metadata aggregator OAI-PMH server for library metadata Search REST service for the library Open Repositories 2008

NCore: NDR API • Uses REST calls for all interactions • Specializes Fedora for

NCore: NDR API • Uses REST calls for all interactions • Specializes Fedora for NDR objects/relationships • Disseminations allow combining metadata from multiple sources, or related content • Authentication: Requests signed with private key associated with an agent • Authorization: Agent can become a metadata provider or aggregator; can create resources • Documented at http: //wiki. nsdl. org/index. php/Community: NDR Open Repositories 2008

OAI-PMH Services • Harvesting and Ingest – Automated process: Harvest trigger files created by

OAI-PMH Services • Harvesting and Ingest – Automated process: Harvest trigger files created by scheduling system – Full logging with email feedback to provider – Automated rescheduling • Repository OAI-PMH serving – Uses Fedora proai service to index Dublin Core datastreams in metadata objects – Collections server as OAI-PMH sets – RDF relationships can be expressed and served as metadata Open Repositories 2008

OAI-PMH Automated Harvesting • Collections validate their OAI-PMH server • CI registers collection (CRS)

OAI-PMH Automated Harvesting • Collections validate their OAI-PMH server • CI registers collection (CRS) • harvest schedule, base. URL, set information… • Full harvest initiated • Subsequent incremental harvests according to schedule • automated emails if problems Open Repositories 2008

NSDL Search Service • Based on Lucene/Nutch • Service exposes full power of Lucene

NSDL Search Service • Based on Lucene/Nutch • Service exposes full power of Lucene queries • Indexes metadata records incrementally harvested from NDR • Crawls resources on web, indexing full text of resource • Scales easily to millions of resources Open Repositories 2008

DDS Search Service • Digital Discovery System Web Service (DDSWS), developed by DLESE/Digital Learning

DDS Search Service • Digital Discovery System Web Service (DDSWS), developed by DLESE/Digital Learning Sciences • REST web service interface • Efficient for moderate sized collections (order 10, 000 records) • Documentation at http: //www. dlese. org/dds/services/ddsws 11/service_specification. jsp Open Repositories 2008

End-user tools. org NCS Web site implementing search service, browsing, and display Interactive collection

End-user tools. org NCS Web site implementing search service, browsing, and display Interactive collection metadata management system Blogging with integrated NSDL search, resource linking, and publication Wiki with integrated NSDL search, resource linking, and publication Content management system with workflow and NDR publication Open Repositories 2008

Status • NSDL. org and OAI server/ingest in production since 2002 • NDR/NDR API

Status • NSDL. org and OAI server/ingest in production since 2002 • NDR/NDR API in production since January 2007 • NDR search service in production since January 2007 • Expert Voices in production since early 2007 • NSDL Wiki in production now • On. Ramp in production since January 2008 • NCS in production now • Source. Forge release of NCore v 1. 1 on December 3, 2007 Open Repositories 2008

Identify Discover Describe Create Relate Store Distribute NCS Annotate Overlay Contribute Integrate Aggregate NDR

Identify Discover Describe Create Relate Store Distribute NCS Annotate Overlay Contribute Integrate Aggregate NDR API Open Repositories 2008

Context and Collaboration Open Repositories 2008

Context and Collaboration Open Repositories 2008

Collaboration Tools NCS Expert. Voices • Word. Press. MU • Blogs/RSS NSDL Wiki •

Collaboration Tools NCS Expert. Voices • Word. Press. MU • Blogs/RSS NSDL Wiki • Media. Wiki • Articles NDR API Open Repositories 2008 On. Ramp • Fez • Documents

Extending Media. Wiki and Word. Press NCS • – – – • • Search

Extending Media. Wiki and Word. Press NCS • – – – • • Search service find resources to talk about insert links to resources Data repository interactions add new resources to the library add referenced resources to the library add metadata about resources Community sign-on (Federation) Administrative Skins/themes RSS NDR API Open Repositories 2008

Current Status • Expert. Voices and NSDL Wiki are using the plug-ins and extensions

Current Status • Expert. Voices and NSDL Wiki are using the plug-ins and extensions NCS • Preparing for public release on sourceforge in early 2008 • Features/improvements – browser compatibilities – flexible metadata vocabulary – “best of” aggregations NDR API Open Repositories 2008

Create Open Repositories 2008

Create Open Repositories 2008

Add References Open Repositories 2008

Add References Open Repositories 2008

Annotate Open Repositories 2008

Annotate Open Repositories 2008

Describe Contribute NDR API Open Repositories 2008

Describe Contribute NDR API Open Repositories 2008

Annotate Aggregate Relate NDR API Open Repositories 2008

Annotate Aggregate Relate NDR API Open Repositories 2008

Blog about it Open Repositories 2008

Blog about it Open Repositories 2008

Repository Relationships NDR API Open Repositories 2008

Repository Relationships NDR API Open Repositories 2008

Referenced Resources <dct: references> in metadata http: //ndr. nsdl. org/api/get/2200/20070828124324051 T/format_nsdl_dc. . . <dct:

Referenced Resources <dct: references> in metadata http: //ndr. nsdl. org/api/get/2200/20070828124324051 T/format_nsdl_dc. . . <dct: references xsi: type="dct: URI"> http: //earthobservatory. nasa. gov/Library/Global. Warming/ </dct: references> NCS <dct: references xsi: type="dct: URI"> http: //www. ametsoc. org/atmospolicy/environmentalsssarchives. html </dct: references>. . . Relationships in objects http: //ndr. nsdl. org/api/get/2200/20070828124324051 T. . . NDR API <relationships> <nsdl: relatedto>2200/20061003225044417 T</nsdl: relatedto> <nsdl: relatedto>2200/20070702180002563 T</nsdl: relatedto> </relationships>. . . Open Repositories 2008

On. Ramp – On. Fire Distribution System NCS NDR API Open Repositories 2008

On. Ramp – On. Fire Distribution System NCS NDR API Open Repositories 2008

Fez – List of Collections NCS NDR API Open Repositories 2008

Fez – List of Collections NCS NDR API Open Repositories 2008

Fez – List of Records NCS NDR API Open Repositories 2008

Fez – List of Records NCS NDR API Open Repositories 2008

Fez - Record NCS NDR API Open Repositories 2008

Fez - Record NCS NDR API Open Repositories 2008

NCS NDR API Open Repositories 2008

NCS NDR API Open Repositories 2008

On. Ramp - Integration with NDR NCS NDR API Open Repositories 2008

On. Ramp - Integration with NDR NCS NDR API Open Repositories 2008

Repository Relationships NCS NDR API Open Repositories 2008

Repository Relationships NCS NDR API Open Repositories 2008

NCore is implemented as an information network overlay. . . Open Repositories 2008

NCore is implemented as an information network overlay. . . Open Repositories 2008

Network Overlay View User View API/UI Repository View with Relations & Annotations Resources on

Network Overlay View User View API/UI Repository View with Relations & Annotations Resources on the Web

Key aspects of this overlay • Vision is to represent contextual knowledge around web

Key aspects of this overlay • Vision is to represent contextual knowledge around web resources • . . . and serves as a forum for independent parties to contribute, discover, use, and re-use this context at will • . . . yet allows for libraries to construct a cohesive and vetted view of the contents therein • . . . all the while allowing these independent parties to go about their business and not step on each other's toes! Open Repositories 2008

Resources are typically references to existing online content. Identified by their URI, which is

Resources are typically references to existing online content. Identified by their URI, which is unique repository-wide Metadata Provider Metadata Agent Aggregator Resource Open Repositories 2008

Agents represent a person, institution, or entity that can make assertions (e. g. Aggregation

Agents represent a person, institution, or entity that can make assertions (e. g. Aggregation membership, assignment of metadata) Agent about other objects in the repository Metadata Provider Metadata Aggregator Resource Open Repositories 2008

Every relationship can be traced back to one Agent Metadata Provider Metadata Aggregator Resource

Every relationship can be traced back to one Agent Metadata Provider Metadata Aggregator Resource Open Repositories 2008

Aggregators represent groupings of objects. Most obvious example: define the set of resources that

Aggregators represent groupings of objects. Most obvious example: define the set of resources that are in a Collection Metadata Provider Metadata Agent Aggregator Resource Open Repositories 2008

Metadata. Providers represent a particular branded `stream' of metadata. One can imagine these as

Metadata. Providers represent a particular branded `stream' of metadata. One can imagine these as similar to an OAI set Metadata Provider Metadata Agent Aggregator Resource Open Repositories 2008

Metadata objects represent a set of statements about a resource. Contain datastreams consisting of

Metadata objects represent a set of statements about a resource. Contain datastreams consisting of metadata 'payload'. Agent Metadata Provider Metadata Aggregator Resource Open Repositories 2008

Provenance is important here, since resources can be described by an arbitrary number of

Provenance is important here, since resources can be described by an arbitrary number of Metadata objects Metadata Provider Metadata Agent Aggregator Resource Open Repositories 2008

The DCS is a comprehensive collection management application from DLESE used to maintain their

The DCS is a comprehensive collection management application from DLESE used to maintain their collections of resources. When the NSDL needed a new collection management tool, they turned to DLESE and adapted the DCS to use the NDR API to store and edit NSDL metadata and collection information. Open Repositories 2008

The new NCS collection manager will allow for access and maintenance of collection metadata

The new NCS collection manager will allow for access and maintenance of collection metadata within the NDR. This tool will allow owners of collections to directly manage their resources and metadata within the NDR, and will be the same tool that the NSDL uses to organize and maintain its own aggregations of resources. Open Repositories 2008

The NCS will use the NDR API and the NSDL data model to maintain

The NCS will use the NDR API and the NSDL data model to maintain collections and aggregations, and will also take advantage of the flexibility of the API to store its own administrative information. Open Repositories 2008

Strand Map Service • Internationally-recognized science learning goals and progressions • Enables teachers and

Strand Map Service • Internationally-recognized science learning goals and progressions • Enables teachers and learners to – Visualize and explore learning goals and their interconnections – Use learning goals to locate and assess resources and curriculum components – Enhance their science knowledge and pedagogical content knowledge – Adapt instructional materials while supporting recognized learning goals • Institutions use SMS web service to create applications and interfaces Open Repositories 2008

What’s Next in 2008? • Registering RSS feeds to support bookmarking, folksonomic tagging systems

What’s Next in 2008? • Registering RSS feeds to support bookmarking, folksonomic tagging systems such as del. icio. us and Nature Publishing’s Connotea • Extensions to the Moodle Course Management System to support searching for and linking to NSDL resources • Source. Forge release of NSDL search code, Media. Wiki extensions, Wordpress extensions • Tool to create NSDL “personal magazine” – user-level selection, organization, annotation and presentation of NSDL resources Open Repositories 2008

NCore: The Technical Ecosystem STEM Collections Archive Service NCS … Protocol: OAI-PMH HTTP REST

NCore: The Technical Ecosystem STEM Collections Archive Service NCS … Protocol: OAI-PMH HTTP REST NDR API

For more information on NCore: http: //wiki. nsdl. org/index. php/Community: NCore Collaborative Tools: On.

For more information on NCore: http: //wiki. nsdl. org/index. php/Community: NCore Collaborative Tools: On. Ramp http: //onramp. nsdl. org Expert Voices http: //expertvoices. nsdl. org NSDL Wiki http: //wiki. nsdl. org Open Repositories 2008

Acknowledgements • NSF EHR/DUE - Lee Zia, Program Officer • NSDL Core Integration Team

Acknowledgements • NSF EHR/DUE - Lee Zia, Program Officer • NSDL Core Integration Team – UCAR: Kaye Howe, PI and Executive Director – Cornell: Dean Krafft, PI – Columbia: Kate Wittenberg, PI • Fedora Development Team – Cornell: Sandy Payette & Carl Lagoze – Univ. of Virginia: Thornton Staples • This material is based upon work is supported by the National Science Foundation under Grants No. DUE-0733600, 0227648, 424671, and 0227888. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Open Repositories 2008

Contact Information Dean B. Krafft Cornell Information Science 301 College Ave. Ithaca, NY 14850

Contact Information Dean B. Krafft Cornell Information Science 301 College Ave. Ithaca, NY 14850 USA dean@cs. cornell. edu This work is licensed under the Creative Commons Attribution-Share. Alike 2. 5 License. To view a copy of this license, visit http: //creativecommons. org/licenses/by-sa/2. 5/ or send a letter to Creative Commons, 543 Howard Street, 5 th Floor, San Francisco, California, 94105, USA. When separated from this work, some images may be covered by separate copyright or license terms. Open Repositories 2008