ETDs for Beginners History and Approach Edward A
ETDs for Beginners: History and Approach Edward A. Fox Executive Director, NDLTD (plus slides from Vinod Chachra, Thom Hickey, Joan Lippincott, and Gail Mc. Millan) Professor, Dept. of Computer Science Virginia Tech (VPI&SU), Blacksburg, VA, USA http: //fox. cs. vt. edu fox@vt. edu ETD 2003 Humboldt University, Berlin 21 -24 May 2003
ACKNOWLEDGEMENTS • ETD 2003 organizers and attendees • Wonderful service of NDLTD Board of Directors, and previous Steering Committee, other committees • Bold efforts by those running ETD initiatives in universities, regions, and countries • Helpful sponsorship by many organizations, especially Adobe, Brocade Communications, c. a. r. u. s. Information Technoligy, Cisco Systems, CONACy. T, Controlware, DFG, Enterasys Networks, Ex Libris, FIPSE, IBM, Image. Ware Components, LIB -IT, Microsoft, Nionex, NSF, OCLC, VTLS, SOLINET, Springer -Verlag, SUN, SURA, T-Systems, UNESCO, many governments (Australia, Germany, India, …), …
PERSPECTIVE
Digital Libraries --- Virginia Tech • • • MARIAN (NLM, NSF) CS DL Prototype - ENVISION (NSF, ACM) TULIP (Elsevier, OCLC) BEV History Base (NSF, Blacksburg) DL for CS Education - EI (NSF, ACM) WATERS, NCSTRL (NSF) NDLTD (SURA, US Dept. of Education, NSF) CSTC (NSF, ACM), CRIM (NSF, SIGMM) WCA (Log) Repository (W 3 C) VT-Peta. Plex-1 (Knowledge Systems) NSDL (NSF): CITIDEL, DL-in-a-Box, Get. Smart American. South. Org (Mellon)
DL Examples • • • IBM Digital Library Virtua (www. vtlc. com) Greenstone (www. greenstone. org) Eprints (www. eprints. org) Many systems in NSF DLI projects VT systems: MARIAN, CSTC, NDLTD • Work on ODL, DL-in-a-box, CITIDEL, NCSTRL
Libraries of the Future JCR Licklider, 1965, MIT Press World Nation State City Community
Info. Literacy (1995) NSF DLI (1994) Improving Education Digital Libraries WWW (1994) PDF (1992) Library Cancellations (1988) Internet (1984) SGML (1985) Multimedia (1986) University Scholarly Electronic Pub. (1988)
Synchronous Scholarly Communication Same time, Same or different place
ous, Digital Library Mediated Scholarly Com Different time and/or place
Information Life Cycle Borgman et al. : Workshop Report on Social Aspects of Digital Libraries: http: //www-lis. gseis. ucla. edu/DL/
Information Life Cycle Authoring Modifying Using Creating Retention / Mining Organizing Indexing Accessing Filtering Storing Retrieving Distributing Networking
Communications (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information Computing (flops) Digital content less more
Digital Libraries Shorten the Chain from Editor Reviewer Publisher A&I Consolidator Library
DLs Shorten the Chain to Author Teacher Digital Reader Editor Reviewer Learner Librarian Library
Digital Libraries --- Objectives • World Lit. : 24 hr / 7 day / from desktop • Integrated “super” information systems: 5 S: streams, structures, spaces, scenarios, societies • Ubiquitous, Higher Quality, Lower Cost • Education, Knowledge Sharing, Discovery • Disintermediation -> Collaboration • Universities Reclaim Property • Interactive Courseware, Student Works • Scalable, Sustainable, Useful
Benefits • Ease of use • Effectiveness • “The benefits of digital libraries will not be appreciated unless they are easy to use effectively. ” - IITA Workshop report
DLs: Why of Global Interest? • National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly • Knowledge and information are essential to economic and technological growth, education • DL - a domain for international collaboration • • wherein all can contribute and benefit which leverages investment in networking which provides useful content on Internet & WWW which will tie nations and peoples together more strongly and through deeper understanding
R e a g a n M o o r e E d F o x Application Domain Related Institutions Examples Technical Challenges Benefit / Impact Publishing Publishers, Eprint archives OAI Quality control, openness Aggregation, organization Education Schools, colleges, universities NSDL, NCSTRL Knowledge management, reuseability Access to data Art, Culture Museum AMICO, PRDLA Digitization, describing, cataloging Global understanding Science Government, Academia, Commerce NVO, PDG, Swiss. Prot, UK e. Science, European Union Commission Data models reproducibility, faster reuse, faster advance (e) Government Agencies (all levels) Census Intellectual property rights, privacy, multi-national Accountability, homeland security (e) Commerce, (e) Industry Legal institutions Court cases, patents Developing standards Standardization, economic development History, Heritage Foundations Crosscutting Library, Archive American Memory Content, context, interpretation Long term view, perspective, documentation, recording, facilitating, interpretation, understanding Web, personal collections Multi-language, preservation, scalability, interoperability, dynamic behavior, workflow, sustainability, ontologies, distributed data, infrastructure Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness J u n e 2 0 0 2 f o r N S F
Digital Libraries • Online course materials at http: //ei. cs. vt. edu/~dlib/rcontents. htm • Topical outlines:
Topical Outline - Foundations • • • Early visions Definitions Resources References Projects
Topical Outline – IR Areas • • Search, Retrieval, Resource Discovery Information storage and retrieval Boolean vs. natural language Search engines Indexing, phrases, thesauri, concepts Federated search and harvesting, OAI Integrating links and ratings Crawlers, spiders, metasearch, fusion • Details following – Li Wang indep. study
Topical Outline - Multimedia • • • Multiple media types, representations Text, audio, image, video, graphics, animation Capture, digitization, standards, interchange Compression, content-based retrieval Playback (Real), SMIL, Qo. S JPEG, MPEG (and versions)
Topical Outline - Architectures • • • Distributed, centralized Modular, componentized Bus (Info. Bus), hierarchical, star Mediators, wrappers (TSIMMIS) Light weight protocols Architecture of OAI and XOAI
Topical Outline – Interfaces • • • Taxonomy of interface components Workflow Visualization Environments Design Usability testing
Topical Outline – Metadata • • MARC Dublin Core RDF IMS OAI (Open Archives Initiative) Crosswalks, mappings Ontologies Topics maps, concept maps
Topical Outline – Epub, SGML, XML • • Authoring Rendering, presenting Structure Tagging, Markup, DOM Semi-structured information Dual-publishing, e. Books Styles (XSL, XSLT) Structure queries
Topical Outline – Databases • • • Extending database technology Structured and unstructured info Multimedia databases Link databases Performance Replicated storage, I 2 -DSI (details following)
Topical Outline – Agents • • • Protocols Knowledge interchange Negotiation, registries Distributed issues Ontologies (standard upper) Webbots (automatic indexing)
Topical Outline – Economics • E-commerce • Sustainability • Preservation and archiving • DLF, Besser, Lorie, Gladney • Self-archiving • Open collections • Economic models, business plans
Topical Outline – IPR • • Intellectual property rights (IPR) Legal issues Terms and conditions Copyright Patents, trademarks Distributed rights management Security
Topical Outline – Social Issues • • • Cooperation, collaboration Annotation, ratings Digital divide Educational applications Cultural heritage Museums (AMICO) Organizational acceptance Personalization Internationalization
DL Challenges • Preservation - so people with trust DLs • Supporting infrastructure - networks, . . . • Scalability, sustainability, interoperability • DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating IR, HT, MM, . . . • Need tools & methods to make them easier to build
Definitions • Library ++ (library+archive+museum+…) • Distributed information system + organization + effective interface • User community + collection + services • Digital objects, repositories, IPR management, handles, indexes, federated search, hyperbase, annotation
Definition: Digital Libraries are complex systems that • • • help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams)
5 S Layers Societies Scenarios Spaces Structures Streams
5 S Models Examples Objectives Stream Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata; organization tools Specifies organizational aspects of the DL content Spatial Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending, Details the behavior of DL services Societies Service managers, learners, Teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
5 S Model for DLs 5 S Definition Streams Sequences of elements of an arbitrary type Structures Labeled directed graphs Spatial Sets and operations on those sets Scenarios Sequences of events that modify states of a computation in order to accomplish some functional requirement. Societies Sets of communities and relationships among them
5 SLGen: Automatic DL Generation
OCKHAM • • Simplicity (a la OCCAM’s razor) Support by Mellon and DLF Next meeting in Atlanta Jan. 8, 2003 Four main ideas: 1. Components 2. Lightweight protocols 3. Open reference models (e. g. , 5 S, OAIS) 4. Community perspective and involvement
Problem Why do DL developers continue to “reinvent the wheel”? The top 10 reasons are: 1. The library budget won’t allow purchase of a commercial DL system. 2. Unless the development effort is local, there won’t be any control. 3. DLs are extensions of DBMSs, so they are simple applications to develop. 4. Since DLs operate on the Web, one must adopt the newest W 3 C proposal.
Problem – cont’d 5. Since technology moves so quickly, it is essential to follow the latest fad. 6. CS students always develop from scratch. 7. This team knows it can do it better. 8. This system must have more capabilities than any other system. 9. This DL has to be more flexible and extensible. 10. This is the right system architecture – at last!
Problem Approach We • address the problem of how to develop DLs; • build on experience in building many DLs; • strive for simplicity as per OCKHAM initiative; • build upon the Open Archives Initiative; • demonstrate our approach in diverse situations; • and invite all to • use DL-in-a-box and • help build Open Digital Libraries.
NUDL (www. nudl. org) Int’l Research Support (1997) • Networked University Digital Library • Partners: Germany, Mexico (Puebla and Monterrey), Brazil • Problems: Multilingual search, high performance DLs, requirements/usability, … • Start with ETDs, then expand to other student works, portfolios, data sets, (CS) courseware, . . . -> institutional repositories
ALPHABET SOUP, NOT ROCKET SCIENCE
Alphabet Soup • E and T or D = ETD • (electronic) • (thesis) • (dissertation)
Alphabet Soup • ET and ED = ETDs
Alphabet Soup • DL and ET or ED = DLTD • (digital library)
Alphabet Soup • SURA and DLs and ETDs = Regional DLTD • (Southeastern University Research Association)
Alphabet Soup • FIPSE and DLs and ETDs = National DLTD • (Fund for the Improvement of Post Secondary Education – US Dept. of Ed)
Alphabet Soup • International and DLs and ETDs = Networked DLTD = NDLTD • (Recall “n” in CNI –> Coalition for Networked Information)
Alphabet Soup - Factoring • NDLTD = ND LTD • (Paul Mather – from UK) • NDLTD = NDL TD • (Edie Rasmussen) • (Later, Networked University Digital Library = NUDL
A Digital Library Case Study • Electronic theses and • Networked Digital Library of Theses and dissertations (ETDs) Dissertations • Submission: (NDLTD) http: //etd. vt. edu http: //www. ndltd. org (formerly “National” • Collection: because of Fed. funds, http: //www. theses. org before international members started joining)
SLIDES FROM 1998
What led to today’s situation? • 1987 mtg in Ann Arbor: UMI, VT, … • 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities with 3 reps each • 1993 mtg in Atlanta to start Monticello Electronic Library (MEL): SURA, SOLINET • 1994 mtg in Blacksburg re ETD project: std of PDF + SGML + multimedia objects • 1996 funding by SURA and US Dept. of Education (FIPSE) for regional, national projects (NDLTD)
VISION, BENEFITS, APPROACH, POSSIBILITIES
What are we doing? • Aiding universities to enhance grad educ. , publishing and IPR efforts: to help improve the availability and content of theses and dissertations • Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i. e. , are Information Literate and can be more expressive) • Demonstrating how for other organizations
What are the key ideas? • Scalability • Empower authors to submit to DL, as a natural part of the educational process • Study workflow & apply automation, so institutions streamline processing and build their part of the DL • Federate along most suitable cultural/political lines • People can switch to electronic documents • Becoming more expressive with hypermedia • Mandating ETDs will change all future scholarship
What are the benefits? • Save students money • Save handling, shelf space in libraries • Build the Networked Digital Library of Theses and Dissertations: with faster, broader, and less expensive access • Demonstrate how universities can work together directly (vs. indirectly through publishers or associations)
What are the long term goals? • 400 K US students / year getting grad degrees are exposed / involved • 200 K/yr rich hypermedia ETDs that may turn into electronic portfolios • Dramatic increase in knowledge sharing: lit. reviews, bibliographies, … • Services providing lifelong access for students/researchers: browse, search, prior searches, citation links
Grad Student Workstation? • Record all work with NDLTD, return to prior situation, prepare bibliography • Powerful (multilingual, text, image) searching, browsing (with categories), following citation links • Support collaboration with others in same field: help with literature review, sharing tools and data sets, applying their methods
Social Capital? • Increase local interchange among students, faculty, library, graduate school • Increase international understanding, building many more invisible colleges, with students more empowered • Connect graduate researchers with undergrads, who can access ETDs / them • Facilitate direct university collaboration, explicitly, in reshaping publishing world
How are ETDs being done at Virginia Tech? • Produced using standard word processing packages as PDF files • • La. Te. X class, outline fonts Word template, PDFwriter • Reviewed by the Graduate School • Cataloged and archived by the library • Downloaded by UMI from server (if payment has been made)
Convene Local Planning Group ETD
Build An ETD Site ETD Workshop/Training Digital Library Policies Inspection/Approval
Student Prepares Thesis or Dissertation NDLTD Literature Computer Resources Research
Student Defends and Finalizes ETD My Thesis ETD
Student Gets Committee Signatures and Submits ETD Signed Grad School
Graduate School Approves ETD Student is Graduated Ph. D.
Library Catalogs ETD and New Students Have Access to the New Research WWW NDLTD
Status of the Local Project • Approved by university governance Spring 1996; required starting 1/1/97 • Submission & access software in place • Submission workshops for students (and faculty) occur often: beginner/adv. • Faculty training as part of Faculty Development Initiative • Over 700 ETDs in collection by 1/98
How can a university get involved? • Select planning/implementation team • • Graduate School Library Computing / Information Technology Institutional Research / Educ. Tech. • Send us letter, give us contact names • Adapt Virginia Tech solution • • Build interest and consensus Start trial / allow optional submission
CONCERNS, PROBLEMS, OPPOSITION
Some Barriers at Universities • • Lethargy; Not invented here (esp. large univ’s) Anger with unfunded, added, required work Last straw: using more frustrating technology Lack of experience in working together: graduate school, library, computing staff • Lack of interest in (quality of) student work • More loyalty to discipline than to campus • Unwillingness to accept responsibility for $ problems with libraries, publishers
MECCA Conf. 6/11/98 • • • • • Armbruster, U. Tennesee, Memphis Bennett, Robert C. , U. Texas Med Sch Brown, Melinda, Vanderbilt Eaton, John, Graduate School, Va Tech Fox, Ed, Computer Science, Va Tech Gherman, Paul, Library, Vanderbilt Goodstein, Lynn, Penn St. U. Hagen, John H. , Library, WVU Hardemon, James, U. Florida Helmstetter, Wendy, Library, FIT, Liston, Rick, NCSU Lutz, Richard, Graduate School, Florida Mc. Farland, Mark U. Texas, Austin Mc. Millan, Gail, Library, Va Tech Minsker, Tom, Penn St U. Mortara, Antionet, FIT Painter, Linda, U. Tennessee Sowell, Robert, Graduate School, NCSU Tague, Larry, U. Tennessee, Memphis Vaughan, Mary Ann, Vanderbilt
ETD Overview
Spirit of NDLTD • • • Help make a better (smaller) world Win-win (everyone can benefit) Have fun helping others Helpers/teachers learn more than those they work with Cooperation, friendly competition • When you “ 1 -up” VT, share your software, documents! • “Doing better” requires both “doing”, “better” • Balance (and build on standards) • New, popular, powerful, expressive, exciting, “better” • Doable, feasible, learnable, affordable, sharable, preservable • We can always do more, enhancing quality and knowledge!
The Networked Digital Library of Theses and Dissertations www. NDLTD. org Training Authors Expanding Access Preserving Knowledge Improving Graduate Education Enhancing Scholarly Communication Empowering Students & Universities Leader of the Worldwide ETD (Electronic Thesis and Dissertation) Initiative
NDLTD Grad Program IT Library Ed. (Tech)
Key Ideas: Scalability Networked infrastructure University collaboration Workflow, automation Education is the rationale Maximal Access 8 th graders vs. grads Authors must submit Standards PDF, SGML, MM, MARC, DC, URNs, Federated search
What led to today’s meeting? • 1987 mtg in Ann Arbor: UMI, VT, … • 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities with 3 reps each • 1993 mtg in Atlanta to start Monticello Electronic Library (regional, US Southeast): SURA, SOLINET • 1994 mtg at VT: std: PDF + SGML + multimedia objects • 1996 funding by SURA, US Dept. of Education (FIPSE) • 1997 meetings in UK, Germany, . . . • 1998 – 1 st symposium – Memphis (20) • 1999 – 2 nd symposium – Blacksburg (70) • 2000 – 3 rd symposium – St. Petersburg (225) • 2001 – 4 th symposium – Caltech (200) • 2002 – 5 th syposium – BYU, Provo, Utah • 2003 – 6 th syposium – Berlin (215) • 2004 – 7 th syposium – U. Kentucky • 2005 – 8 th syposium – Sydney, Australia
NDLTD Membership • • • As of 5/17/2003 there were at least: 176 members, including: 155 individual universities 6 consortia 21 institutional members
National / Regional Projects • Australia • • U. New South Wales (lead) U. of Melbourne U. of Queensland U. of Sydney Australian National U. Curtin U. of Technology Griffith U. • Belgium • Brazil • Germany • Humboldt University (lead) • 3 other universities • 5 learned societies: Math, Physics, Chemistry, Sociology, Education • 1 computing center • 2 major libraries • India • Lithuania • Spain: Consorci de Biblioteques Universitàries de Catalunya, as group, www. cbuc. es: 9 sites • Sudan • UK (British Library, JISC, Edinburgh) • UNESCO (especially Latin America, Eastern Europe, Africa) • USA: • CIC (“Big 10”) • Ohio: Ohio. LINK: 79 colleges/univs • SOLINET • …
Ohio. LINK • • • Statewide Consortium Represents 79 colleges, universities, libraries Public Universities Private Universities and Colleges 2 -Year Colleges Only a few (e. g. , Miami U. of Ohio) are also NDLTD members on their own
US University Members • • • • • • • • • Air University (Alabama) Baylor University Boston University Brigham Young University Caltech Clemson University College of William & Mary Concordia University (Illinois) Drexel University – required 4/2002 East Carolina University East Tenn. State U. – required 1/2001 Florida Institute of Technology Florida International University Florida State University Florida Tech George Washington University Georgetown University Johns Hopkins University Louisiana State University – required 1/2002 Marshall University (W. Va. ) Miami University of Ohio Michigan Tech Mississippi State University MIT Montana State University Naval Postgraduate School (CA) New Jersey Inst. of Technology New Mexico Tech North Carolina State University – required 9/2002 Northwestern University Penn. State University Regis University Rochester Institute of Tech. Texas A&M • • • • • • • • • U. of Central Florida U. of Colorado Health Science Center U. of Florida – required 8/2001 U. of Georgia – required 9/2001 U. of Hawaii, Manoa U. of Illinois, Urbana-Champaign U. of Iowa U. of Kentucky – required in CS only U. of Maine – required in CS, Spatial Info Sci/Eng U. of Missouri-Columbia U. of North Texas – required since 8/99 U. of Oklahoma U. of Nevada, Las Vegas U. of New Orleans U. of North Texas – required 8/1999 U. of Oklahoma U. of Pittsburgh U. of Rochester U. of South Florida – required 8/2002 U. of Tennessee, Knoxville U. of Tennessee, Memphis U. of Texas at Austin – required 6/2001 U. of Virginia – required 1/2003 U. of West Florida U. of Wisconsin - Madison – part reqt 12/1999 Vanderbilt U. Virginia Commonwealth U. Virginia Tech - required 1/97 Wake Forest U. West Virginia U. - required 8/1998 Western Kentucky U. – required 9/2004 Western Michigan U. Worcester Polytechnic Inst. – required 7/2002 Yale U.
Other Countries (selected) • • • • • Australia Belgium Brazil Canada Chile China, Hong Kong Columbia Finland France Germany Greece India Italy Jamaica Korea Lithuania Mexico • • • • Netherland Norway Poland Russia Singapore S. Africa S. Korea Spain Sudan Sweden Taiwan Thailand UK Venezuela
Institutional Members • • • • • • Australian Digital Theses Program British Library Cinemedia Coalition for Networked Information (CNI) Committee on Institutional Cooperation (CIC) Consorci de Biblioteques Universitàries de Catalunya Diplomica. com Dissertationen Online (Germany) ETDweb, a Division of Answer 4. com Ibero-American Science & Technology Education Consortium (ISTEC) Math. DISS International National Documentation Centre (NDC), Greece National Library of Canada National Library of Portugal OCLC Online Computer Library Center Office of Scientific and Technical Info (US Dept of Energy) Ohio. LINK Organization of American States (SEDI/OAS) Southeastern Library Network (SOLINET) Sudanese National Electronic Library UNESCO (www. unesco. org/webworld/etd)
UNESCO and ETDs • Promoting the use of the Internet as a tool for disseminating scientific knowledge • Facilitating the transfer of ETD expertise from developed to developing countries • 1998: Member of the NDLTD Steering Committee • 1999: First UNESCO ETD meeting on ETD internationalisation • 2002: “UNESCO Guide to Electronic Theses and Dissertations” • 2003: Model training programmes and training courses • 2003: Sponsor pilot projects • 2003: Pilot projects (Africa, Europe, Latin-America)
Access Possibilities Web search engines www. theses. org Virginia MIT National Tech Library of Portugal www. library openarchives. catalog org clients CBUC (Spain) Ohio Link 3 rd Party Services (e. g. , Bell & Howell) National Projects: AU, GE, …
ETD Initiative (and Pro. Quest) Students Learn about DL, EPub TDs become more expressive Global TDs become more accessible, archived Universities UMI N. Amer. (T)Ds are accessible, archived
Why ETD? Short Answer • For Students: • Gain knowledge and skills for the Information Age • Richer communication (digital information, multimedia, …) • For Universities: • Easy way to enter the digital library field and benefit thereby • For the World: • Global digital library – large, useful, many services • General: • Save time and money • Increased visibility for all associated with research results
The Process? Short Answer • For Students: • Plan on ETD from day 1 • Secure knowledge from: workshops, online info, colleagues • Work with faculty to plan approach • PDF? XML? TEI? Multi/hypermedia? Data sets? Viz? • Get signed approval form: access, ©, proxy assignment • After defense and approval, submit ETD to university • For Universities: • Form team • Adapt solution from work at other universities, attend ETD conference • Pilot -> Option -> Requirement
Assistance • Software, documentation, tech support • Email, listservs (etd-l@listserv. vt. edu) • UNESCO sponsored etdguide. org • English in 2001, Spanish&French in 2002 • Training sessions in Latin America … • Marcel Dekker book soon in press • www. ndltd. org
Open Archives Initiative OAI www. openarchives. org openarchives@openarchives. org
Technical Umbrella for Practical Interoperability… Reference Libraries Museums Publishers E-Print Archives …that can be exploited by different communities
The World According to OAI Service Providers Discovery Current Awareness Metadata harvesting Data Providers Preservation
Tiered Model of Interoperability Mediator services Metadata harvesting Document models
Repository of Digital Objects Repository Access Protocol handle terms and conditions Digital object
OAI – Black Box Perspective OA 7 OA 4 OA 2 OA 1 OA 3 OA 6 OA 5
OAI – Black Box Perspective Services: Search Browse Metadata: Summarize Visualize OA 7 OA 4 OA 2 OA 3 OA 1 OA 6 OA 5 Docs: DO DO
Protocol for Metadata Harvesting • Service Requests • • • Identify List. Metadata. Formats List. Sets Get. Record List. Identifiers List. Records Metadata Multiplicity Date/Time Ranges Sets (with semantics depending on local data providers) Resumption Tokens
Key Features of the OAI Metadata Harvesting Protocol • definitions & concepts • repository • record • identifier • datestamp • set • protocol features • HTTP encoding • metadata prefix & schema • flow control • protocol requests • supporting requests • harvesting requests
repository support data harvesting data h a r v e s te r OAI protocol r e p o s i t o r y items
selective harvesting - datestamps harvest within date range record r e p o s i t o r y
DL Components Gateways MM/ HT Renderer User Interfaces Workflow Mgr Search Engines, Classifiers, … DBMS Rights Mgr Data, MM Info Repository
Open Digital Library (ODL) Hypothesis (Hussein Suleman) • Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ? Maybe … if Digital Libraries can be modeled as • networks of extended Open Archives, where • each extended Open Archive is a • source of data and/or a provider of services.
? Document 1010100101 Program 1010100101 Video 1010100101 Image 1010100101 0100101010 1001010101 010101 1010100101010 100101 01001010101 0101010101 users 1010100101 0100101010 1001010101 01001010101 0101010101 100101010101 0101010101 digital objects
? ? Document 1010100101 0100101010 1001010101 010101 ? ? ? ? Image 1010100101 0100101010 1001010101 010101 ? ? componentized digital library Program 1010100101 0100101010 1001010101 010101 ? Video 1010100101 0100101010 1001010101 010101
XPMH OA OA OA XPMH OA XPMH OA Document 1010100101 0100101010 1001010101 010101 PMH XPMH OA OA XPMH open digital library PMH Program 1010100101 0100101010 1001010101 010101 Image 1010100101 0100101010 1001010101 010101 Video 1010100101 0100101010 1001010101 010101
Component System Approach • (Open) DL = Network of Extended OAs Data Input Local Archive Resource Discovery Search Browse Recommend Metadata Repository legend Remote Archive User Interface OAI/ODL archive OAI/ODL protocol
Example Architecture (NDLTD) Virginia Tech User Interface Phys. Net Humboldt Search Browse Recent Duisburg Cal. Tech Union Catalog MIT Filter MIT legend Dresden User Interface OAI/ODL archive OAI/ODL protocol
ODL Demonstration - Front. Page
ODL Component Requirements • Search • Retrieve a list of items • Index new items • Annotate • Add annotation to item • Retrieve a list of annotations for an item
Open Digital Library Components • Running now • XML-File (data provider from file system) • Union, search, browse, recent, filter • E-journal/review, Submit, Edit, Annotation • Class projects • High performance multilingual search • Recommender, Rating; Mirroring (see JCDL’ 02) • Working with NCSA: from DB, unstructured text • Others discussed • Classification/categorization • DL-Viz interconnection (VIDI – Jun Wang ETD)
Open Digital Library: Extended As What’s New Service Provider What’s New Engine XML File Coll. & Data Provider 1 XML File Coll. & Data Provider 2 XML File Coll. & Data Provider 3 As Metadata Search Service Provider IRDB-1 Search Engine As Metadata Browse Service Provider DBBrowse Engine As Recommend & Rate Service Provider Recommend Rate Engine DBUnion Archive Merger Component Harvest from As Annotation Search Service Provider Annotation Engine data providers Filter OAI-PMH Data Provider Submit Archive IRDB-2 Search Engine OAIB (NCSA: from RDBMS)
Example Open Digital Library ODLRecent Document ETD-1 1010100101 0100101010 1001010101 010101 Recent USER INTERFACE Students and researchers ODLUnion Browse PMH Filter PMH ODLUnion PMH ODLBrowse ODLUnion Search ODLSearch Program ETD-2 1010100101 0100101010 1001010101 010101 PMH Filter PMH Image ETD-3 1010100101 0100101010 1001010101 010101 Video ETD-4 1010100101 0100101010 1001010101 010101 Digital Library for the Networked Digital Library of Theses and Dissertations (www. ndltd. org) ETD collections
Example Open Digital Library for the Computer Science Teaching Center (www. cstc. org)
Digital Library in a Box • Domain: helping DL projects • Genre: any domain, but especially those involved in NSDL (since funded in part is through NSDL – with U. FL, NCSA) • Software and Documentation: http: //dlbox. nudl. org
DL Standardized Log Format- Design 5 S Definition Use in Log Design Streams Represent static and dynamic multimedia content Temporal events, types of digital objects Structure s Labeled directed graphs; Structured documents and metadata; structured provide organization within the searches, collection, metadata catalog; DL hypertext, classification scheme Spaces Sets, properties and operations on those sets Retrieval mode, Presentation information, Scenarios sequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. Societies Sets of communities and relationships among them User information
ETDs and Libraries Gail Mc. Millan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State University Ohio State University/Virginia Tech Video Conference October 24, 2002
Goals for Libraries and Archives • Improve services • Better turn-around time • Always available • Reduce work (save $) • Catalog from etext • Eliminate handling • Save space
ETDs at Virginia Tech • Partnership: Library, Graduate School, and Faculty • Approved by university governance- Mar. 1996 • Full implementation- Jan. 1997 • Web submission • Students: http: //etd. vt. edu • Programmers: http: //scholar. lib. vt. edu/ETD-db/ • Workshops for students (and faculty) • Over 5000 ETDs approved
How are ETDs managed? • Graduate student creates ETD • • • Graduate student submits ETD • • • Directly to library server/permanent archive Archiving fee replaces binding fee Graduate School approves • • Word processor, multimedia Saves as PDF, usually E-mails author, advisor, UMI (VT scripts) Authors/advisors prescribe Internet access Library catalogs and archives UMI downloads
http: //scholar. lib. vt. edu/theses/available/etd-2227102539751141/
Library Resources • Hardware: server • Maintenance and security • Started small: Ne. Xt 3. 3 (HP; 1989 -97) • Grew: Sun dual-processor Enterprise 250 --Solaris 2. 7 (Apache web server) • Software • Submission scripts written by DLA • Includes e-mail notifications to authors, advisors, UMI • Use it too: http: //scholar. lib. vt. edu/ETD-db/ • Log files analyzed with Analog • Survey scripts written by DLA • Data from authors and readers • Use it too: http: //lumiere. lib. vt. edu/surveys/ • Search Engine • Started small: free. WAIS >> Grew: Info. Seek’s ULTRASEEK
Financial Concerns • At VT: start-up costs = $0 • On-hand staff, equipment, software, freeware • From zero base: estimate $65, 000 • $24, 000 • $36, 000 • $15, 000 Staff (part time) Equipment Software http: //scholar. lib. vt. edu/theses/data/setup. html
Costs/Savings at VT • Graduate School stopped shipping to the library 3000 copies of paper TDs/year • Library stopped handling (e. g. , shipping, binding, shelving, and circulating) 3000 copies of TDs/year • 166 ft of shelf space saved yearly by the Library • VT used existing equipment in Library (vs. start-up costs for staff, hardware and software)
Digital Library Benefits: Low margin, high use • Incorporate ETDs with other digital library activities • Ejournals, online class materials, digital images, etc. • Additional equipment, staff may not be necessary • http: //scholar. lib. vt. edu/theses/data/setup. html • Use VT programs, scripts, etc. • http: //scholar. lib. vt. edu/ETD-db/ • Online accesses vs. circulation of copies • VT theses 1990 -1994, combined average circulation per copy: 2. 24/yr • VT dissertations 1990 -1994, combined average circulation per copy: 3. 2/yr
Access to VT’s ETDs http: //scholar. lib. vt. edu/theses/
Why are ETDs so popular? • User surveys • • • 67% found VT ETDs easily 61% found them by searching 22% browsed by department 16% browsed by author 53% downloaded 1 or more ETDs • Author surveys • Conversion and submission processes less difficult than anticipated • Over half plan to publish articles from their ETDs • Why did they restrict access? http: //lumiere. lib. vt. edu/surveys/
Availability of 4224 VT ETDs
Reasons for Restricted Access
ETDs and Accessibility • Inaccessible ETDs • Patents pending • Future publication fears • Broken links • Quality of work remains • Similar to out-of-print articles • Media standards • Open source software (e. g. , PDF reader) • Typical commercial software • Few esoteric programs, include original scripts
ETDs and Publishing • Early controversies waning • Faculty: prior publication? • Protective of future academics • Surveys of publishers • No specific policies largely • Consider submissions individually • VT ETD Alumni • None had problems getting published • Authors • Retain some rights, e. g. , link to curriculum vitae, online course materials
ETDs and Copyright • Author’s rights • Reproduction, modification, distribution, public performance, public display • Retain rights • Share non-exclusive rights • Permit library to store and to provide access • Publishers • Author’s obligations: fair use • Balance factors or get permission • Notification: optional Copyright 2002 by Gail Mc. Millan ALL RIGHTS RESERVED • Registration: optional • Possibly receive greater compensation, with less documentation if filing infringement law suit
ETDs and Long-term Preservation • Concerns: Access without paper • Long term preservation • Standard multimedia formats • PDF Reader: an open source • http: //scholar. lib. vt. edu/theses/archive. html • Addressed Concerns • Cooperatives • Ohio. Link • Why not: OCLC, NDLTD? • Commercial options • UMI: traditional microfilming • Frequent, regular back-ups available on, off-site
Ensuring Access to VT ETDs • Every 15 minutes back-ups made of newest, notyet-approved submissions • Hourly back-ups of newly approved ETDs • Weekly back-ups of entire ETD collection • Multiple copies stored on-site and off-site • NDLTD: let’s reciprocate, cooperative mirroring
Lessons from ETDs • Implementation of new formats slower than expected • Text oriented • Not planning for online readers • If you build it, it will get used. • Access exceeded expectations • Disappointing number are inaccessible • Remarkable increase in exposure to graduate student research • Requiring institutions slower than expected • No longer experimental • Increase in number and diversity of NDLTD institutions
Available at VT • Information http: //scholar. lib. vt. edu/theses • Automated submission system ready for customization http: //scholar. lib. vt. edu/ETD-db/ • Student guidelines, training materials, FAQ's, multimedia educational materials http: //etd. vt. edu • NDLTD: Network educational institutions • Annual conferences: Berlin 2003, U of Kentucky 2004 http: //www. ndltd. org
Union Catalog (with Vinod Chachra, Thom Hickey)
NDLTD Union Catalog Architecture VT ODL Demo Search/Browse SRU/SRW (search) OAI-PMH TD OAI ETD OAI OCLC Repository OAI-PMH Virtua VTLS Union Catalog OAI-PMH World. Cat Try: Z 39. 50 harvest 20+ sites email FTP
Union Catalog Creation
OCLC Capabilities • Harvesting • OAI-PMH versions 1. 1 and 2. 0 • Harvestable sets • Sets by institution • Searching • SRU (Z 39. 50 on the Web) • VTLS • Virginia Tech Open Digital Library demo • Unicode support
OCLC Statistics • 19 Sources • 61, 998 records • Probably some overlap • Adding 1 -2 new sites/month
OCLC Metadata Formats • Dublin Core – All • ETDMS – 9 • MARC – 5
Complex to Simple MARC ($50) Dublin Core (DC) + thesis
ETD-MS • ETD Metadata Standard • XML-encoded metadata standard (content and encoding) for Electronic Theses and Dissertations (ETDs) • in part conforming to Dublin Core (DC) • using UNICODE • (optionally / later using RDF) • Well specified relationship with MARC
NDLTD Members and ETD-MS • NDLTD members • Share metadata for their ETDs • Providing that in either ETD-MS • Or if they use a version of MARC locally, work to have that eventually shared in either MARC 21 or UNIMARC • Run OAI, either locally or in consortia, so their metadata can be harvested, according to necessary terms and conditions
The OAI Static Repository Model • Components of the model • The static repository • An well-defined structure XML file with information similar to that in OAI-PMH responses • Accessible at a persistent network-location • The static repository gateway • makes one or more Static Repositories harvestable. • assigns a unique base URL to each such Static Repository • Responding to OAI-PMH requests
The OAI Static Repository Model
NDLTD Union Catalog Statistics 1. Participating Countries n So far ETDs from 7 countries are included in the database. n Canada n Germany n Greece n Korea n Portugal n Spain n U. S. n UK to be added by June 30, 2002. Brazil to be added soon. n
NDLTD Union Catalog Statistics 2. Interface Languages in Union Catalog n n n The language here is the language of the interface The VTLS NDLTD Union Catalog has 14 languages: n English, Arabic, Catalan, Chinese n French, German, Hebrew, Korean n Polish, Portuguese, Russian, Slovak n Spanish and Swedish Example follows
German
NDLTD Union Catalog Statistics 3. Languages in the Union Catalog n n n The language here is the language of the content of ETD The VTLS NDLTD Union Catalog has data in 6 different languages. These are: n English n German n Greek n Korean n Portuguese n Spanish Examples follow
Language = German; hits = 137
Full record display
Language = Greek
In Greek In English
Other Topics • • • Extended services: linking Retrospective conversion Z 39. 50 Requiring ETDs …
Collaborative Development (Joan Lippincott)
Why Collaboration? • Expertise in aspects of the digital environment • Pooling of resources
Collaboration and digital projects • • • Distributed systems Digital course content Digital library resources Delivery of services Development of policies
Collaborations involve: • Shared goals • Common vision • Shared vocabulary
Two views of an ETD progam • Have staff scan • Implement now • Increase university visibility • Teach students to write and submit ETDs • Implement soon • Develop electronic authors
In a collaboration. . . • Each contributes resources • Partners acknowledge and value contributions • Partners develop a clear process • Group and individual accountability
ETD project participants • • Academic administrators Faculty Students Staff Graduate school / provost / registrar Information technologists Librarians
Collaboration and NDLTD • Common goals of members • Diverse sets of skills and expertise • Need for strategies and tactics to surmount any problems -> advocacy
Collaborative project strategy • Champion initiates project • Leadership establishes initial goal and parameters • Issue a call for participants • Conduct procedure to select participants
Collaborative project strategy • Initial meeting • Develop shared goals • Develop clear process • Continue work at institutions • Establish communication channels • Establish project milestones • Evaluate progress, refine approach
Collaborative project strategy • Disseminate results • Online documentation • In-person event • Disseminate a product • Regional workshops • Session at ETD 20 XX
NDLTD project areas • • Training materials Promotional materials Identify and recommend standards Local, national, regional policies
Your Plans (Ana Pavani)
- Slides: 175