Digging into Digital Libraries From Archaeology to Formalism
Digging into Digital Libraries: From Archaeology to Formalism Edward A. Fox Virginia Tech, Dept. of CS fox@vt. edu CSC Spring Colloquium Villanova – February 20, 2006 1
Acknowledgements (selected) • 5 S Helpers: Weiguo Fan, Marcos Gonçalves, Doug Gorton, Rohit Kelapure, Neill Kipp, Uma Murthy, Ananth Raghavan, Rao Shen, Hussein Suleman, Srinivas, Vemuri, Layne Watson, … • Sponsors: ACM, AOL, CAPES, DFG, IBM, Microsoft, NSF (IIS-9986089, 0086227, 0080748, 0325579, 0535057, 0535060; ITR 0325579; DUE-0121679, 0136690, 0121741, 0333601), SUN
Outline • • • WWW and Digital Libraries (DLs) Minimal DLs Powerful DLs Why How Summary and Conclusions 3
WWW and DLs • • • Both emerged in early 1990 s. Convergence began around 1994. Example: Google spun off from Stanford DL. Crawling WWW is one way to build DLs. WWW support many portals to DLs. Parts of WWW that have catalogs (e. g. , Yahoo categories) are close to DLs. • Web Services help move WWW toward DLs, 4 as the Semantic Web emerges.
Degree of Structure Web DLs DBs Chaotic Organized Structured 5
NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup Portals & Clients NSDL Collections referenced items&& Special items collections Databases collections User Interfaces Core NSDL “Bus” Collection Building Core Services: Collectionmetadata Building Core gathering Collection. Services protocols Building Services harvesting NSDL Services Other NSDL Services Usage Enhancement Core Services: CI Services information retrieval CI Services browsing CI Services authentication CI Services personalization CI Services discussion annotation 6
7
8
9
Outline • WWW and Digital Libraries (DLs) • Minimal DLs – Definitions – ETANA example • • Powerful DLs Why How Summary and Conclusions 10
Minimal Digital Libraries • • • Key concepts, core ideas Minimalist perspective Underlying concepts: 5 S (ETANA example) Higher DL constructs Bases: – Literature – Informal explanations – Formal definitions 11
Informal 5 S & DL Definitions DLs are complex systems that • • • help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams) 12
5 Ss Ss Examples Objectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among 13 them
Example of 5 Ss: ETANA-DL • • Archaeological DL (Electronic Tools for Ancient Near Eastern Archaeology Digital Library) Integrated DL – • Applies and extends the OAI-PMH – • Heterogeneous data handling Open Archives Initiative Protocol for Metadata Handling Design considerations – – Componentized Extensible Portable Work based on 5 S framework 14
15
ETANA Societies 1. Historic and pre-historic societies (being studied) 2. Archaeologists (in academic institutes, fieldwork settings, or local and national governmental bodies) 3. Project directors 4. Technical staff (consisting of photographers, technical illustrators, and their assistants) 5. Field staff (responsible for the actual work of excavation) 6. Camp staff (e. g. , camp managers, registrars, tool stewards) 7. General public (e. g. , educators, learners, citizens) 16
ETANA Societies – cont’d • Social issues 1. Who owns the finds? 2. Where should they be preserved? 3. What nationality and ethnicity do they represent? 4. Who has publication rights? 5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this? 17
ETANA Scenarios 1. 2. 3. 4. Life in the site in former times Digital recording: the planning stage and the excavation stage Planning stage: remote sensing, fieldwalking, field surveys, building surveys, consulting historical and other documentary sources, and managing the sites and monuments Excavation 1. 2. 3. 4. 5. 6. 7. 8. Detailed information is recorded, including for each layer of soil, and for features such as pole holes, pits, and ditches. Data about each artifact is recorded together with information about its exact find spot. Numerous environmental and other samples are taken for laboratory analysis, and the location and purpose of each is carefully recorded. Large numbers of photographs are taken, both general views of the progress of excavation and detailed shots showing the contexts of finds. Organization and storage of material Analysis and hypotheses generation and testing Publications, museum displays Information services for the general public 18
ETANA Spaces 1. Geographic distribution of found artifacts 2. Temporal dimension (as inferred by archaeologists) 3. Metric or vector spaces 1. used to support retrieval operations, and to calculate distance (and similarity) 2. used to browse / constrain searches spatially 4. 3 D models of the past, used to reconstruct and visualize archaeological ruins 5. 2 D interfaces for human-computer interaction 19
ETANA Structures 1. Site Organization 1. Region, site, partition, sub-partition, locus, … 2. Temporal orderings (ages, periods) 3. Taxonomies 1. for bones, seeds, building materials, … 4. Stratigraphic relationships 1. above, beneath, coexistent 20
ETANA Streams 1. successive photos and drawings of excavation sites, loci, unearthed artifacts 2. audio and video recordings of excavation activities and discussions 3. textual reports 4. 3 D models used to reconstruct and visualize archaeological ruins. 21
5 S and DL formal definitions and compositions (April 2004 TOIS) 22
A Minimal DL in the 5 S Framework Streams Structured Stream Structures Spaces Structural Metadata Specification Scenarios Societies services Descriptive Metadata Specification indexing browsing searching hypertext Digital Object Collection Metadata Catalog Repository Minimal DL 23
24
Outline • WWW and Digital Libraries (DLs) • Minimal DLs • Powerful DLs – Services – Ontology • Why • How • Summary and Conclusions 25
26
Ontology: Applications 27
Ontology: Applications • Expand definition of minimal DL by characterizing – typical DL services – in the context of “employs” and “produces” relationships • Use characterization to: – Reason about how DL services can be built from other DL components – As well as be composed with other services through extension or reuse 28
Composition of key fundamental / infrastructure services 29
30
Outline • • WWW and Digital Libraries (DLs) Minimal DLs Powerful DLs Why – – – Support DL education Practical systems Institutional repositories (DSpace) Personal DLs (Sense. Cam -> Memex) Support archaeology • How • Summary and Conclusions 31
DL Curriculum Framework 32
Foundations for Information Systems: Digital Libraries and the 5 S Framework • Ch. 1. Introduction (Motivation, Synopsis) • • Part 1 – The “Ss” Part 2 – Higher DL Constructs Part 3 – Advanced Topics Appendix 33
Book Parts and Chapters - 1 • Ch. 1. Introduction (Motivation, Synopsis) • Part 1 – The “Ss” – Ch. 2: Streams – Ch. 3: Structures – Ch. 4: Spaces – Ch. 5: Scenarios – Ch. 6: Societies 34
Book Parts and Chapters - 2 • Part 2 – Higher DL Constructs – Ch. 7: Collections – Ch. 8: Catalogs – Ch. 9: Repositories and Archives – Ch. 10: Services – Ch. 11: Systems – Ch. 12: Case Studies 35
Book Parts and Chapters - 3 • Part 3 – Advanced Topics – Ch. 13: Quality – Ch. 14: Integration – Ch. 15: How to build a digital library – Ch. 16: Research Challenges, Future Perspectives • Appendix – A: Mathematical preliminaries – B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL – D: Formal Definitions: Archeological DL – E: Glossary of terms, mappings 36
Practical Systems • Commercial: IBM, VTLS, … • Open Source – Greenstone – CWIS (for NSDL) – Institutional repositories • DSpace • Fedora 37
Institutional Repositories • “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution. ” • Lynch, C. A. In ARL Bimonthly Report 226, pp. 1 -7, Feb. 2003, www. arl. org/newsltr/226/ir. html 38
39
ETANA-DL Global Architecture Dig. Base and Dig. Kit Lahav Nimrin Umayri Hisban Megiddo Jalul … New Sites D A T A B A S E W R A P P E R S Search U S E R Browse Recommend ETANA-DL UNION CATALOG Note Personalize Review Visualizations Archaeology Specific I N T E R F A C E 40 Work in progress
Megiddo Opening Screen 41
Locus Screen: Pictures View all 42
Area Screen 43
Global DL: Architecture of a Union DL DL 1 Society archaeologists Service Searching Union DL Union Society Archaeologists General Public Union Service Harvesting, Mapping, Searching, Browsing, Clustering, Visualization DL 2 Society General Public Service Browsing Catalog 1 Union Catalog 2 Repository 1 Union Repository 2 44
Outline • • • WWW and Digital Libraries (DLs) Minimal DLs Powerful DLs Why How – – – Components Metamodels, Models Graphical model building aids DL generators Integration Quality • Summary and Conclusions 45
? ? Document 1010100101 0100101010 1001010101 010101 ? ? ? ? Program 1010100101 0100101010 1001010101 010101 Image 1010100101 0100101010 1001010101 010101 ? ? ? Video 1010100101 0100101010 1001010101 010101 componentized digital library 46
The World According to OAI: Open Archives Initiative – Protocol for Metadata Harvesting Service Providers Discovery Current Awareness Preservation Metadata harvesting Data Providers 47
48
49
Metamodels • Completed – Minimal – Archaeological • Planned – Practical – System oriented • Doug Gorton’s thesis, so people can build models for their systems, and have them generated to work with a particular DL system 50
A Minimal DL in the 5 S Framework Streams Structured Stream Structures Spaces Structural Metadata Specification Scenarios Societies services Descriptive Metadata Specification indexing browsing searching hypertext Digital Object Collection Metadata Catalog Repository Minimal DL 51
5 SL – The Minimal DL Metamodel 52
A Minimal Arch. DL in the 5 S Framework Streams Structured Stream Spaces Descriptive Metadata specification Scenarios Societies services Spa. Tem. Org Stra. Dia Arch. Obj Arch Descriptive Metadata specification indexing browsing searching hypertext Arch. DO Arch Metadata catalog Arch. Coll Arch. DR Minimal Arch. DL 53
54
Overview of 5 SGraph Workspace (instance model) Structured toolbox (metamodel) 55
Tools/Applications 56
5 SGen – Version 2: ODL, Services, Scenarios 57
XML-based DL Log Standard • Log analysis – is a source of information on: • How patrons really use DL services • How systems behave while supporting user information seeking activities • Used to: – Evaluate and enhance services – Guide allocation of resources • Common practice in the web setting – Supported by web servers, proxy caches • DL Logging can be more detailed 58
The XML Log Format Log Transaction Session. Id Machine. Info Timestamp Event Status. Info Search. By Session. Info Register. Info Timestamp Statement Action Browse Query. String Statement Update Collection Catalog Store. Sys. Info Timeout Presentation. Info 59
DL Integration • What is “DL Integration” – Hide distribution – Hide heterogeneity – Enable autonomy of individual component • Why Integration – island-DLs – inability to seamlessly and transparently access knowledge across DLs Utilize various autonomous DLs in concert 60
Formal Definition of DL Integration • DLi=(Ri, DMi, Servi, Soci), 1 i n – – • • Ri is a network accessible repository DMi is a set of metadata catalogs for all collections Servi is a set of services Soci is a society Union. Rep Union. Cat Union. Services Union. Society 61
Formal Definition of DL Integration (Cont. ) • DL integration problem definition: Given n individual libraries, integrate the n DLs to create a Union. DL. 62
ETANA-DL Approach • Applying and extending Digital Library (DL) techniques to solve key problems: making primary data available, data preservation, and interoperability • Modeling archaeological information systems using 5 S to better understand the domain and design the system and the supporting services • Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks: – eliciting requirements – refining metamodel and union schema – modeling sites – mapping – harvesting 63 – providing useful services
Example of Union Service: Citi. Viz 64
Union Catalog Integration Virtual Nimrin (VN) VN Metadata Format Mapping Tool Union Arch. DL VN Catalog Halif Dig. Master (HD) Wrapper Union Catalog HD Catalog Global Metadata Format Wrapper HD Metadata Format Mapping Tool 65
local schema global schema 66
Describing Quality in Digital Libraries • What’s a “good” digital Library? – Central Concept: Quality! – Hypotheses of this work: • Formal theory can help to define “what’s a good digital library” by: • New formalizations of quality indicators for DLs within our 5 S framework • Contextualizing these measures within the Information Life Cycle 67
Quality Dimensions 68
Quality and the Information Life Cycle 69
Summary and Conclusions • • • WWW and Digital Libraries (DLs) Minimal DLs Powerful DLs Why How • -> Theory-based discipline and high quality DL management systems (DLMS) 70
Selected Links - http: //fox. cs. vt. edu • CITIDEL (computing education resources) – www. citidel. org • NCSTRL (computing technical reports) – www. ncstrl. org • NDLTD (electronic theses and dissertations worldwide) – www. ndltd. org and etdguide. org • NSDL (National Science Digital Library) – www. nsdl. org • OAI (Open Archives Initiative) – www. openarchives. org • Virginia Tech Digital Library Research Laboratory (DLRL, www. dlib. vt. edu) – 5 S, American. South. Org, CSTC, DL-in-a-box, ENVISION, ETANA, MARIAN, NDLTD, NSDL, OAD, ODL, …) 71
Questions? Discussion? Thank You! 72
- Slides: 72