Developing the Digital Institutional Repository at HKUST Diana

  • Slides: 41
Download presentation
Developing the Digital Institutional Repository at HKUST Diana Chan, Head of Reference KT Lam,

Developing the Digital Institutional Repository at HKUST Diana Chan, Head of Reference KT Lam, Head of Systems HKUST Library December 9, 2003 Hong Kong University of Science & Technology Library

Outline of Presentation 1. 2. 3. 4. 5. 6. 7. Why Create the Institutional

Outline of Presentation 1. 2. 3. 4. 5. 6. 7. Why Create the Institutional Repository? Demonstration How Did HKUST Develop It? Challenges IR Software Evaluation DSpace Implementation at HKUST Q&A HKUST Library 2

1. Why Create the Institutional Repository? l What is an Institutional Repository (IR)? A

1. Why Create the Institutional Repository? l What is an Institutional Repository (IR)? A “digital collection capturing and preserving the intellectual output of a single or multi-university community”. - - adopted from “The case for institutional repositories: a SPARC position paper” prepared by Raym Crow. <http: //www. arl. org/sparc/IR/ir. html> HKUST Library 3

1. Why Create the IR? Budapest Open Access Initiative l <http: //www. soros. org/openaccess/index.

1. Why Create the IR? Budapest Open Access Initiative l <http: //www. soros. org/openaccess/index. shtml> 1. 2. Recommends 2 Strategies: Self-archiving in Open Electronic Archives Open Access Journals HKUST Library 4

1. Why Create the IR? l Dual Open-Access Strategy <http: //www. ecs. soton. ac.

1. Why Create the IR? l Dual Open-Access Strategy <http: //www. ecs. soton. ac. uk/~harnad/Temp/berlin. htm> BOAI-2 ("gold"): Publish your article in a suitable open-access journal whenever one exists. BOAI-1 ("green"): Otherwise, publish your article in a suitable toll-access journal and also self-archive it. HKUST Library 5

1. Why Create the IR? l Why have an IR at HKUST? To create

1. Why Create the IR? l Why have an IR at HKUST? To create a permanent record of the scholarly output of HKUST - No access to some scholarly works published by our own faculty - Collections of working papers, technical reports, research reports flowing around - Some of our scholarly works are in the public domain HKUST Library 6

1. Why Create the IR? l Why have an IR at HKUST? To help

1. Why Create the IR? l Why have an IR at HKUST? To help the international Open Access effort. Because the mission of disseminating knowledge is only half complete if it is not widely and readily available to society. - Adapted from the Berlin Declaration <http: //www. zim. mpg. de/openaccessberlin/berlindeclaration. html> HKUST Library 7

1. Why Create the IR? l The Contribution Must Satisfy 2 Conditions: l The

1. Why Create the IR? l The Contribution Must Satisfy 2 Conditions: l The author…grants to all users a free…right of access to, and a license to copy, use, distribute, transmit and display the work publicly … l A complete version of the work is deposited in…at least one online repository - From the Berlin Declaration HKUST Library 8

2. Demonstration l HKUST Institutional Repository <http: //repository. ust. hk/> l l DSpace interface

2. Demonstration l HKUST Institutional Repository <http: //repository. ust. hk/> l l DSpace interface Sample record Submission form Search in OAISter HKUST Library 9

Collection Type and Size Communities Collections HKUST Library 18 37 Book chapters Conference papers

Collection Type and Size Communities Collections HKUST Library 18 37 Book chapters Conference papers Journal articles Patents Presentations Preprints Technical reports Theses Working papers Miscellaneous 1 85 66 62 40 12 109 110 35 6 Total 520 10

3. How Did HKUST Develop It? 1. Planning & Policies 2. Technical Developments 3.

3. How Did HKUST Develop It? 1. Planning & Policies 2. Technical Developments 3. Harvesting and Promotion 4. Work Teams 5. Negotiations with Publishers HKUST Library 11

3. 1 Planning and Policies l Task Force - software, scope, policies, database structure,

3. 1 Planning and Policies l Task Force - software, scope, policies, database structure, problems, action plans. l Information Services Committee – guidelines on different types of publications, publishers’ policies, data formats, faculty concerns. l Library Administrative Committee – final approvals. HKUST Library 12

3. 2 Technical Developments l Will be discussed by KT Lam in parts 5

3. 2 Technical Developments l Will be discussed by KT Lam in parts 5 & 6 HKUST Library 13

3. 3 Harvesting and Promotion Within HKUST: 1 st Stage : Prototype 105 Computer

3. 3 Harvesting and Promotion Within HKUST: 1 st Stage : Prototype 105 Computer Science Technical Reports 2 nd Stage: Target Group: Faculty who already posted their publications on the Web Emailed 80. 49 agreed. Harvested 144 documents. HKUST Library 14

3. 3 Harvesting and Promotion Within HKUST: 3 rd Stage: Target Group: All Faculty

3. 3 Harvesting and Promotion Within HKUST: 3 rd Stage: Target Group: All Faculty Emailed all to encourage direct submission. 2 documents submitted. Notes from the Library 4 th Stage: Target group: All Faculty Emailed all telling which publishers allow postrefereed self-archiving (IEEE, ACM, Emerald, SPIE…). 3 documents submitted HKUST Library 15

3. 3 Harvesting and Promotion In the Cyberspace: l Harvested 53 US Patents l

3. 3 Harvesting and Promotion In the Cyberspace: l Harvested 53 US Patents l Harvested 21 journal articles from Emerald l Harvested 10 articles from DOAJ l Joined OAISTer HKUST Library 16

3. 3 Harvesting and Promotion Planned: l Will harvest conference proceedings held at HKUST

3. 3 Harvesting and Promotion Planned: l Will harvest conference proceedings held at HKUST and published by HKUST l Will cover Ph. D theses with signed permissions l Will contact departments for preprints, working papers, technical reports, etc. l Will contact faculty whose publications have not been posted l Departmental visits HKUST Library 17

3. 4 Work Teams l Subject Librarians l Data Entry Team HKUST Library 18

3. 4 Work Teams l Subject Librarians l Data Entry Team HKUST Library 18

3. 4 Work Teams – Subject Librarians 1 Liaison With Faculty 6. Do Indexing

3. 4 Work Teams – Subject Librarians 1 Liaison With Faculty 6. Do Indexing 5. Verify Document Versions HKUST Library 2. Check Faculty’s Publication Lists 3. Harvest Documents 4. Ascertain Publishers’ Policies 19

3. 4 Work Teams– Data Entry Team 4. Create Folders & Upload Files 3.

3. 4 Work Teams– Data Entry Team 4. Create Folders & Upload Files 3. Set PDF Document Security & Properties HKUST Library 1. Verify and Convert PDF Documents 2. Data Entry Using Submission Form 20

Flowchart on Data Entry Get indexed documents from librarians Set Pdf document Security &

Flowchart on Data Entry Get indexed documents from librarians Set Pdf document Security & Properties Final Check Screen and Convert Files Group to Different Folders Input Data Check for Errors HKUST Library Define Communities & Collections in DSpace & Upload Files 21

3. 5 Negotiations with Publishers l Collection Development Librarian wrote to: l l l

3. 5 Negotiations with Publishers l Collection Development Librarian wrote to: l l l l INFORM Pro. Quest Wiley Springer IEEE AAAS Elsevier Result: No good news yet. HKUST Library 22

4. Challenges Faculty: l Low awareness of Open Access Initiative (OAI) l Concern over

4. Challenges Faculty: l Low awareness of Open Access Initiative (OAI) l Concern over copyright issues l Apathy in direct submission l Lack of willingness to negotiate on non-exclusive rights and to publish in open access journals l Lack of willingness to provide the right versions of documents (pre- or post-refereed) l Only a small % of scholarly work can be archived HKUST Library 23

4. Challenges Institution: l Needs to make a mandate to deposit all research outputs

4. Challenges Institution: l Needs to make a mandate to deposit all research outputs with the Institutional Repository l Needs to give financial support to faculty who submit papers to open access journals HKUST Library 24

4. Challenges Publishers: l In Romeo project, only 34 out of 80 allow some

4. Challenges Publishers: l In Romeo project, only 34 out of 80 allow some sort of archiving l Many have no policy (Camford, Genetic Society of America) l Many have an unclear policy l Some: l l Decline to give permissions (Springer, AAAS) Give no response (INFORM) Give a wrong answer (Wiley) Need to include self-archiving into license agreements with publishers HKUST Library 25

4. Challenges Library continue to: l Provide support for university research selfarchiving l Promote

4. Challenges Library continue to: l Provide support for university research selfarchiving l Promote IR l Educate users and faculty about the IR l Showcase the IR l Find champions and partners among faculty l Seek institutional mandate and support l Harvest documents HKUST Library 26

5. IR Software Evaluation 1. 2. 3. 4. 5. 6. Background Eprints DSpace Why

5. IR Software Evaluation 1. 2. 3. 4. 5. 6. Background Eprints DSpace Why Did We Choose DSpace? Evaluation Guide Other Software and Selection Criteria HKUST Library 27

5. 1 Background l l Institutional repository software - also known as institutional archive-creating

5. 1 Background l l Institutional repository software - also known as institutional archive-creating software, or digital repository software. HKUST Library started IR software evaluation in late December 2002. Two products were evaluated: Eprints and DSpace. Decided to use DSpace in mid-February 2003. HKUST Library 28

5. 2 EPrints <http: //software. eprints. org/> l l l Developed by University of

5. 2 EPrints <http: //software. eprints. org/> l l l Developed by University of Southampton. The very first freely available institutional repository software; since 2000. GNU software, thus, open source. Has the largest installed base. Written in Perl, with My. SQL and Apache. HKUST Library 29

5. 3 DSpace <http: //www. dspace. org/> l l Jointly developed by MIT Libraries

5. 3 DSpace <http: //www. dspace. org/> l l Jointly developed by MIT Libraries and Hewlett-Packard Company. Open source available since late December 2002, after two years of development. Written in Java, with Postgre. SQL, Lucene, and Apache/Tomcat. Still under development. HKUST Library 30

5. 4 Why Did We Choose DSpace? l l DSpace was developed based on

5. 4 Why Did We Choose DSpace? l l DSpace was developed based on the experience gained by EPrints. It has a well defined data model: Community + Collection + Item + Metadata + Bundle + Bitstream l l l UTF-8 capable. Well organized web-interface. Metadata in Dublin Core format. HKUST Library 31

5. 5 Evaluation Guide l “A Guide to Institutional Repository Software” by Open Society

5. 5 Evaluation Guide l “A Guide to Institutional Repository Software” by Open Society Institute <http: //www. soros. org/openaccess/software/OSI_Guide_ to_Institutional_Repository_Software_v 1. htm> HKUST Library 32

5. 6 Other Software & Selection Criteria l Other IR Software: l l CDSware

5. 6 Other Software & Selection Criteria l Other IR Software: l l CDSware – from CERN. I-TOR – from Netherlands Institute for Scientific Information Services. My. Co. Re – from University of Essen. Selection Criteria: l l l Open source. Comply to OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting). Currently released and publicly available. HKUST Library 33

6. DSpace Implementation at HKUST 1. 2. 3. DSpace Server Problems Limitations HKUST Library

6. DSpace Implementation at HKUST 1. 2. 3. DSpace Server Problems Limitations HKUST Library 34

6. 1 DSpace Server <http: //repository. ust. hk/> l l PC with Pentium 4,

6. 1 DSpace Server <http: //repository. ust. hk/> l l PC with Pentium 4, 2. 4 GHz, 1 GB RAM memory Red. Hat Linux, with standalone Tomcat, Postgre. SQL database, and Lucene search engine. DSpace Version 1. 1. 1. Becomes live since late February 2003. HKUST Library 35

6. 2 Problems l Faculty Submission Form l l l DSpace’s build-in submission interface

6. 2 Problems l Faculty Submission Form l l l DSpace’s build-in submission interface is too complicated. We have to develop our own submission form. Then use DSpace’s Item Importer to load the data. CJK Search Failure l Fixed by modifying DSpace Java source codes. HKUST Library 36

6. 2 Problems l CNRI Handle l l Custom Authentication l l Required registration

6. 2 Problems l CNRI Handle l l Custom Authentication l l Required registration at CNRI for a handle prefix. Our prefix is 1783. 1. Added java codes to query HKUST’s LDAP server. Handling of non-English Characters l Uses the approach adopted in our Electronic Theses Database. HKUST Library 37

6. 2 Problems l Server Hanging Problem l Other Software Bugs HKUST Library 38

6. 2 Problems l Server Hanging Problem l Other Software Bugs HKUST Library 38

6. 3 Limitations l Flatten Community+Collection structure l l Linked Collection l l 2

6. 3 Limitations l Flatten Community+Collection structure l l Linked Collection l l 2 -level only, not deep enough. a collection that belongs to more than one communities. Unable to Cross l search multiple collections from different communities. HKUST Library 39

6. 3 Limitations l Query Syntax Not Apparent to Users, e. g. +water +rapid

6. 3 Limitations l Query Syntax Not Apparent to Users, e. g. +water +rapid "vapor generator" [for exact word match] [for phrase search] l Limited Capability on Sorting Search Results. l Cannot Display the Number of Items in the Repository, in a Community, and in a Collection. HKUST Library 40

Related Websites l American-Scientist September Forum <http: //www. ecs. soton. ac. uk/~harnad/Hypermail/Amsci/index. html> l

Related Websites l American-Scientist September Forum <http: //www. ecs. soton. ac. uk/~harnad/Hypermail/Amsci/index. html> l Open Access Presentation <http: //www. ecs. soton. ac. uk/~harnad/Temp/openaccess. ppt> l Self-Archiving FAQs <http: //www. eprints. org/self-faq/> l SPARC Institutional Repository Checklist & Resource Guide <http: //www. arl. org/sparc/IR/IR_Guide. html> HKUST Library 41