Croatian Internet serials Croatian Electronic Publishing Results of
Croatian Internet serials Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić sklarin@nsk. hr, spigac@nsk. hr, dpavelic@efzg. hr National and University Library, Zagreb Faculty of Economics, Zagreb 1
Topics Part 1 • Context: facts, presumptions and questions Part 2 • Results of the survey Croatian remote access e-serials Part 3 Use of metadata in e-serials, possibilities for Croatia 2
1. Electronic publishing using the Internet • explosion of publishing activities since 90 s raises the problems of searching, retrieval, identification and preservation of electronic documents • World Wide Web (1995 ) • Cataloguer-based management vs. • Author-based management(Koehler) 3
Too big? 1. 1 How big is the Web? Lawrence & Giles (1999): • 800 million web pages • 15 TB of information • 6 TB of text Bright. Planet - Lexi. Bot software(2000) • 19 TB - the “surface” Web Croatia (since 1991) • 7, 500 TB - the “deep” Web • 8000. hr domains Kulturarw 3 project - Sweden • types, number of • web harvesting files? • 7. 5 million files • types of resources? • 300 GB • publishers? 4
1. 2 Lawrence & Giles (1999): Valuable material? • 83% of sites contain commercial content and 6% contain scientific or educational content in the Web 05. 08. 2000 most visited Croatian sites (Proof) 5
Too ephemeral ? 1. 3 Persistence of Web documents (Koehler, 1999) • Web pages are unstable – go under change (in a year 99% of web pages - some degree of change) – disappear – 5% return within a specific period of time • Two types of change – change of content (20% in a week) – change of structure (20% in a week) 6
1. 4 Low use of metadata on the WWW Search/retrieval? Reliability? Authenticity? Interchange? Lawrence & Giles (1999) • the simple HTML "keywords" and "description" metatags are only used on the homepages of 34% of sites • only 0. 3% of sites use the Dublin Core metadata standard • who are Web “publishers”? – can they accept standards for management and interchange of metadata? Publishers? 7
1. 5 Products of electronic publishing New types of resources? • data or/and • local access • programs • hybrid • remote access resources • public access • restricted access • static • monograph • dynamic publications (finite publications. ) • continuing resources? – serials – integrating resources 8
2. The survey (January 2000 - April 2001) • The aim of the survey on e-serials: quantity, categories, persistence, publishers, metadata usage in Croatian web space. . . • sample: - electronic publications which consist of successive parts with numerical or chronological designations - in Croatian or produced by Croatian publishers, available via WWW • items excluded: OPACs or databases, lists/archives, web sites, online services, advertisements 9
2. 1 Identification • Lists, directories, portals, search engines: Cro. Links http: //www. crolinks. com www. hr - News, media, journals Iskon - Net. hr portal http: //www. iskon. hr Google, Yahoo • from their print versions • from publishers 10
2. 2 Numbers • Total number: disappeared: changed URL: ceased: changed the title: 153 16 12 2 1 • NL Denmark - 1069 (2000) • NL Norway - 299 (1999) 11
Croatian Internet serials 2. 3 Categories: Newspapers Journals Weekly/fortnightly magazines Scientific journals Religious magazines Serials published by societies Student journals Serials published by universities, scientific institutes Serials published by civil services Serials published by companies Serials of unknown type 28 42 8 10 18 4 14 11 9 9 ----------------------------------Sums 153 12
2. 4 Editions: electronic, both electronic and printed 110 42 + e. g. Vjesnik, Večernji list, Slobodna Dalmacija both electronic and print e. g. Mountain Bikinig, Morsko prase electronic only 1 Internet Monitor print became electronic 13
2. 5 Place of publication: – – Zagreb: 115 Split: 6 Rijeka: 5 Osijek, Dubrovnik, Varaždin, Čakovec Slavonski Brod: 2 – Karlovac, Zadar, Pula Koprivnica, Ičići, Prelog, Sv. Ivan Zelina, Rovinj, Virovitica 1 – other: (AT) 1 – unknown: 4 14
2. 6 URLs: Croatian domain or …? A. hr 82% B. com 17% other 1% C www. vjesnik. hr www. vecernji-list. hr www. slobodnadalmacija. hr www. nacional. hr www. vef. hr/vetarhiv www. nn. hr/Glasilo/index. htm www. hi-fi. hr/hgz wam. hi-fi. hr www. agr. hr/smotra/index. htm www. monitor. hr www. gradst. hr/engmod www. bug. hr www. hrvatska. com/glaspodravine duhovno-vrelo. com www. moravek. net/kla www. win-ini. com www. hrvatskenovine. at cyberdream. croadria. com www. zarez. com www. hrvatska. com/bilten. html www. kapital. com etc. 1 item 3 URLs / domains (. hr. com. net) 1 item 2 URLs / domains (. hr. com) 15
2. 6. 1 Domains, URLs • 28 items have top-level domain name e. g. www. vjesnik. hr, www. morsko-prase. hr • 12 items changed URL: – 5 from first/second. . . level domain to top-level domain name e. g. http: //www. hbk. hr/GK/gk. htm http: //www. glas-koncila. hr – 5 internal changes of the site (domain) e. g. http: //www. kdb. hr/projekt/paedro/index. htm http: //www. kdb. hr/paedro/ – 1. hr . com – 1. com . hr • 16 items disappeared: – 11. hr 68, 75% – 5. com 31. 35% (total. hr 82%) (total. com 17%) 16
2. 7 Chronological overview ‘ 94 ‘ 95 ‘ 96 ‘ 97 ‘ 98 ‘ 99 2000 2001 year titles 1994 2 1995 6 1996 11 1997 21 1998 26 1999 33 2000 24 2001 2 unknown 26 17
2. 8 Low metadata use Croatian e-serials • HTML metatags ”keywords” “description” “author” Lawrence & Giles (1999) – 32. 8% (September • simple HTML metatags are only used on the 2000) homepages of 34% of – 33. 3% (April 2001) sites. • 1 title - DC metadata • Only 0. 3% of sites use the standard Dublin Core metadata standard. 18
2. 9 Metadata <HTML> <HEAD> <META NAME="GENERATOR" CONTENT="Adobe Page. Mill 3. 0 Win"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859 -2"> <TITLE>ACS-AGRICULTURAE CONSPECTUS SCIENTIFICUS</TITLE> <LINK REV="made" HREF="mailto: smotra@agr. hr"> <META NAME="keywords" CONTENT="Croatia, agriculture, science, publication, agricultural, economics, rural, sociology, plant, pathology, herbology, animal, nutrition, engineering, soil, amelioration, microbiology, dairy, agronomy, breeding, genetics, botany, zoology, crops, fishery, beekeeping, husbandry, forades, grassland, ornamental, ladnscape, architecture, farm, management, enology, viticulture, pomology"> <META NAME="description" CONTENT="On-line Scientific Journal" AGRICULTURAE CONSPECTUS SCIENTIFICUS PUBLISHED BY FACULTY OF AGRICULTURE UNIVERSITY ZAGREB> <META NAME="copyright" CONTENT="ACS Agriculture Conspectus Scientificus"> <META NAME="revisit-after" CONTENT="60 Days"> <META NAME="Robot" CONTENT="ALL"> <META NAME="DC. Title" CONTENT="ACS-AGRICULTURAE CONSPECTUS SCIENTIFICUS"> <META NAME="DC. Creator" CONTENT="Agriculture Conspectus Scientificus, Faculty of Agriculture, Zagreb CROATIA"> <META NAME="DC. Publisher" CONTENT="Faculty of Agriculture University of Zagreb"> </HEAD> 19
2. 10 Metadata questionnaire • sent in April 2001 by e-mail to 160 publishers, editors, webmasters… e- • to find out more about their familiarity with metadata, and their intentions to use metadata and cooperate with librarians • an effort to raise the awareness among publishers of the need for “electronic title page” to be included in their publications 20
27 answers representing 32 publications received (17, 3% or 20, 6%) 6 incorrect statements: 4 claim to use metadata (they don’t!) 2 claim not to use metadata (they do!) Do you know what metadata is? Do you use metadata? 21
The benefits of metadata • facilitates search and retrieval • promotes the company/publ. • helps identify the author and the content of the publication • everybody uses metadata • reliability and authenticity of publ. • contains copyright information <title> <keywords> <author> <description> <copyright> 95, 7% 52, 2% 60, 9% 21, 7% 69, 6% 56, 5% 52, 2% 13% 8, 7% 4, 3% 22
Metadata is created by. . . 25, 8% don’t use metadata because they: • know nothing about metadata • don’t have enough time • don’t have enough employees 50% 12, 5%23
Meatadata generators? (DC-dot, Tag. Gen, DC assist, Ed. NA, AHDS, Reggie, Nordic DC metadata generator, SAFARI) • aware of their existence • not aware 11% 71% – would like to be informed 100% 24
Metadata is contained in. . . • homepage only • all pages (same metadata) • all pages (different metadata) 26, 1% 17, 4% 47, 8% 25
Metadata standardization? 1. Have you heard of metadata standardization? 2. Which metadata schema do you know of? 3. Would a metadata guideline help you? 4. Is standardization important for your work? 5. Would you like to have standardized metadata in your publ. ? 26
Could librarians help you? • librarians work on standardization of bibl. description • I’d appreciate any help • librarians describe print publ. • librarians work on standardization of metadata • we are already familiar with library activities (ISBN, ISSN, CIP…) • librarians don’t know much about the Web • webmasters should do that • can do it by myself 48% 44% 32% 12% 24% 50% 44% 25% 27
E-journals available through the library Web. PAC? YES 93, 8% • it’s useful information for users • it’s important to treat both print and e-publ. in the same way • it’s useful for publishers 75% 46, 4% NO 6, 3% • people prefer to use search engines • web publications often change their URLs - “I’m not sure librarians should catalogue them” 28
Dublin Core Metadata Initiative survey From Feb. 20 th to March 9 th, 2001. The purpose of the questionnaire was to help achieve some of the DC Libraries Working Group’s objectives for 2001, including: (1) to collect and share examples of Dublin Core use in libraries and (2) to stimulate discussion that will feed into the process of drafting an application profile for the use of Dublin Core in libraries DC-General and DC-Libraries lists, CORC Users List, and The Alberta Library Metadata List 29 responses from 9 countires Most used: creator, publisher, title, rights, type, identifier, format, description Low use of qualifiers http: //dublincore. org 29
3. Use of metadata in eserials and possibilities in Croatia E-serials - digital / hybrid libraries - databases (publishers, vendors) cooperation (BIBLINK) hosted. ukoln. ac. uk/biblink - separately (web pages) 30
3. 1. Using metadata 1. Inside the document – HTML (XML) <head> metadata </head> <body> document described above </body) 2. Separate file - metadata records + links to e-serials (bibliography, similar serials…) - file containing metadata – link from web page with no metadata in the <header> (DC web page) 31
3. 2 Metadata schemes 1. before Internet and electronic publications (cataloguing, exchange – MARC, GILS, CIMI) 2. development of Internet (searching, cataloguing, exchange) Qualified Dublin Core (dublincore. org) 1. translations versions (21 language) 2. no Croatian but translation is finished 32
3. 3. Creation & conversion tools - Creating metadata (templates) Nordic DC metadata creator (including URN generator) (choice of controlled vocabularies, classification, date format, identifier) - Creation / change of templates Reggie, Mantis (OCLC) Hot. META (search DC) - Automatic extraction / gathering from HTML (enter URL) DC-dot (results in DC, RDF, XHTML - aditional corrections possible) Donor metatagenerator (similar to Nordic DC) 33
3. 3. Creation & conversion tools - Automatic production Klarity (automatically generates metadata based on concepts found in text) Scorpion (automatic classification to DDC) - Commercial software Tag. Gen Dublin Core edition (number of schemes and possibilities) Metabrowser (shows Metadata and Web Pages simultaneously) http: //dublincore. org/tools 34
3. 3. Creation & conversion tools DC-dot - ( http: //www. agr. hr/smotra ) 35
3. 3. Creation & conversion tools Donor - ( http: //www. agr. hr/smotra ) 36
3. 3. Creation & conversion tools Metabrowser – “Metabrowser is a web browser that catalogues web pages using schemas such as Dublin Core, GILS, AGLS. Metabrowser allows metadata to be added to web pages accessible from a local or network drive or sent to an external system such as a database or firewalled web server” 37
3. 3. Creation & conversion tools Conversion: - DC -> MARC (Dan, Fin, Is, Nor, Swe, US) Nordic Metadata Project: DC to MARC converter (www. bibsys. no/mete/d 2 m) - Crosswalks: DC, MARC 21, EAD, GILS, ISAD, FGDC (www. ukoln. ac. uk/metadata/interoperability) 38
3. 3. Creation & conversion tools Nordic metadata project: DC to MARC converter 008010508 s 245 $a ACS-AGRICULTURAE CONSPECTUS SCIENTIFICUS 260 $b Faculty of Agriculture University of Zagreb 856 $u http: //www. agr. hr/smotra 39
3. 3. Creation & conversion tools Conversion MARC -> XML -> MARC ( www. logos. com/marc) ( www. culture. fr/Biblio. ML) - additional applications needed 40
3. 4. Which model / scheme ? - company / organization needs - connection and cooperation with other companies / organizations - budget - standardization - softver and upgrading possibilities - exchange of data / records Libraries Publishers Vendors different needs and aims 41
3. 4. 1 Choose scheme and strategy Croatia Libraries - bibliographic control, - up-to-date record collections (users benefit), - exchange Publishers - timely, accurate and full exposure of their products and services, - search and retrieval – benefit users and publisher, - standardized record in databases for possible exchange and profit Cooperation ! 42
3. 4. 1 Choose scheme and strategy Croatia Use knowledge and experience from foreign projects: Biblink CORC DONOR (Cooperative online resources cataloguing) (Directory of Netherlands online resources) - Inform publishers of standards and possibilities (survey) - Point out necessity of standardization and use of one primary (major) scheme (Dublin Core ? ) - Show them how to use free web-available tools 43
3. 5 DC – RDF - XML Dublin Core is enough for basic description (qualified) – serves our needs for the beginning RDF (Resource Description Framework) is about to become standard (semantic web) XML (e. Xtended Markup Language) is already growing standard (strucure, exchange, ebusiness, internal control…) 44
3. 5 DC – RDF - XML RDF - development is still in process but… Many projects and tools exist (creation, conversion) Constant work, often non-commercial (learn & use) Croatia - use same metadata scheme (DC? ) enriched with internal metadata scheme if needed (for publishers use) - embed it into HTML documents - convert to RDF-XML eventualy 45
3. 6. Conclusion Low use of any metadata scheme opens possibility to adopt one primary scheme (DC? ) and emerging standard (RDF? ) Concentrate on the start and strategy, use experience from others Build environment to help publishers (similar to Biblink) Cooperation among libraries and publishers is essential 46
3. 7 Links http: //dublincore. org www. ifla. org www. ukoln. ac. uk www. w 3 c. org www. editeur. org www. xml. com www. logos. com/marc www. culture. fr/Biblio. ML 47
- Slides: 47