Vienna University of Technology Vienna 21 Sept 2007

  • Slides: 141
Download presentation
Vienna University of Technology (Vienna – 21 Sept 2007) “From information retrieval to digital

Vienna University of Technology (Vienna – 21 Sept 2007) “From information retrieval to digital libraries to computer science education” Edward A. Fox • fox@vt. edu http: //fox. cs. vt. edu • Dept. of Computer Science, Virginia Tech • Blacksburg, VA 24061 USA 1

“From information retrieval to digital libraries to computer science education” • ABSTRACT: Information is

“From information retrieval to digital libraries to computer science education” • ABSTRACT: Information is a fundamental human need. The field of information retrieval has helped address this need since the 1960 s, with a range of models and systems. A broad view of this field leads to digital libraries, a re-definition of the concepts, systems, and human involvement in sharing information across time and space, supported by digital technologies. We can formalize and better operationalize this through the 5 S framework, which addresses information with regard to Societies, Scenarios, Spaces, Structures, and Streams. This approach has supported our work with personalization and computer science syllabi, curriculum development regarding digital libraries, and ensuring that college graduates are prepared not only to live in, but also to help build our future cyberinfrastructure, i. e. , for Living In the Knowl. Edge Society (LIKES). This talk will summarize our related research and education innovation. 2

Acknowledgements (selected) • Colleagues: Lillian Cassel, Debra Dudley, Weiguo Fan, Marcos Gonçalves, Doug Gorton,

Acknowledgements (selected) • Colleagues: Lillian Cassel, Debra Dudley, Weiguo Fan, Marcos Gonçalves, Doug Gorton, Rohit Kelapure, Neill Kipp, Aaron Krowne, Ming Luo, Uma Murthy, Manuel Perez, Ananth Raghavan, Rao Shen, Hussein Suleman, Srinivas Vemuri, Layne Watson, … • Sponsors: ACM, AOL, CAPES, DFG, Google, IBM, IMLS, INL, Microsoft, NSF (CCF 0722259; IIS-9986089, 0080748, 0086227, 0307867, 0325579, 0535057, 0535060, 0736055 ; DUE-0121679, 0121741, 0136690, 0333531, 0333601, 0435059, 0532825), SUN, …

Acknowledgements - Mentors • JCR Licklider – undergrad advisor (1969 -71) – Author in

Acknowledgements - Mentors • JCR Licklider – undergrad advisor (1969 -71) – Author in 1965 of “Libraries of the Future” – Before, at ARPA, funded start of Internet • Michael Kessler – BS thesis advisor – Project TIP (technical information project) – Defined bibliographic coupling • Gerard Salton – graduate advisor (1978 -83) – “Father of Information Retrieval” 4

Information Retrieval: Algorithms and Heuristics 2 nd Ed. • • By David A. Grossman

Information Retrieval: Algorithms and Heuristics 2 nd Ed. • • By David A. Grossman & Ophir Frieder Kluwer Academic Publishers 5

Document Retrieval (Grossman & Frieder Fig. 1. 1) 6

Document Retrieval (Grossman & Frieder Fig. 1. 1) 6

Vector Space Model – 2 terms (Grossman & Frieder Fig. 2. 2) 7

Vector Space Model – 2 terms (Grossman & Frieder Fig. 2. 2) 7

Language Model (Grossman & Frieder Fig. 2. 5) 8

Language Model (Grossman & Frieder Fig. 2. 5) 8

Document-Term-Query Inference Network (Grossman & Frieder Fig. 2. 7) 9

Document-Term-Query Inference Network (Grossman & Frieder Fig. 2. 7) 9

Inference Network Layers (Grossman & Frieder Fig. 2. 8) 10

Inference Network Layers (Grossman & Frieder Fig. 2. 8) 10

Relevance Feedback Process (Grossman & Frieder Fig. 3. 1) 11

Relevance Feedback Process (Grossman & Frieder Fig. 3. 1) 11

Information Life Cycle Authoring Modifying Using Creating Retention / Mining Organizing Indexing Accessing Filtering

Information Life Cycle Authoring Modifying Using Creating Retention / Mining Organizing Indexing Accessing Filtering Storing Retrieving Distributing Networking 12

Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place 13

Asynchronous, Digital Library Mediated Scholarly Communication Different time and/or place 13

DLs Shorten the Chain to Author Teacher Digital Reader Editor Reviewer Learner Library Librarian

DLs Shorten the Chain to Author Teacher Digital Reader Editor Reviewer Learner Library Librarian 14

DL Definitions - 1 • “A digital library is an organized and focused collection

DL Definitions - 1 • “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection. ” • Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003 15

DL Definitions - 2 • “Digital libraries are organizations that provide the resources, including

DL Definitions - 2 • “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities” • Waters, D. J. CLIR Issues, July/August 1998 • www. clir. org/pubs/issues 04. html 16

DL Definitions - 3 • Issues and Spectra – Collection vs. Institution – Content

DL Definitions - 3 • Issues and Spectra – Collection vs. Institution – Content vs. System – Access vs. Preservation – “Free” vs. Quality – Managed vs. Comprehensive – Centralized vs. Distributed 17

DL Definitions - 4 • NOT a “digitized library” • NOT a “deconstruction” of

DL Definitions - 4 • NOT a “digitized library” • NOT a “deconstruction” of existing systems and institutions, moving them to an electronic box in a Library • IS a new way to deal with knowledge – Authoring, Self-archiving, Collecting, – Organizing, Preserving, – Accessing, Propagating, Re-using 18

19

19

Informal 5 S & DL Definitions DLs are complex systems that • • •

Informal 5 S & DL Definitions DLs are complex systems that • • • help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams) 20

Hypotheses • A formal theory for DLs can be built based on 5 S.

Hypotheses • A formal theory for DLs can be built based on 5 S. • The formalization can serve as a basis for modeling and building highquality DLs. 21

5 S Framework • “Streams” - All types of (multimedia) content (as well as

5 S Framework • “Streams” - All types of (multimedia) content (as well as communications and flows over networks, or into sensors, or sense perceptions; data stream management systems) • “Structures” - Organizational schemes (including data structures, databases, and knowledge representations – taxonomies, ontologies) 22

5 S Framework • “Spaces” - 2 D and 3 D interfaces, GIS data,

5 S Framework • “Spaces” - 2 D and 3 D interfaces, GIS data, representations of documents and queries • “Scenarios” - System states and events, but also can represent situations of use by human users (or machine processes, yielding services or transformations of data) • “Societies” - Both software “service managers” and fairly generic “actors” who could be (collaborating) human (users). 23

5 Ss Ss Examples Objectives Streams Text; video; audio; image Describes properties of the

5 Ss Ss Examples Objectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among 24 them

25

25

ETANA-DL • • Archaeological DL Integrated DL – Heterogeneous data handling • Applies and

ETANA-DL • • Archaeological DL Integrated DL – Heterogeneous data handling • Applies and extends the OAI-PMH – Open Archives Initiative Protocol for Metadata Handling • Design considerations – Componentized – Extensible – Portable 26

27

27

ETANA Societies 1. Historic and pre-historic societies (being studied) 2. Archaeologists (in academic institutes,

ETANA Societies 1. Historic and pre-historic societies (being studied) 2. Archaeologists (in academic institutes, fieldwork settings, or local and national governmental bodies) 3. Project directors 4. Technical staff (consisting of photographers, technical illustrators, and their assistants) 5. Field staff (responsible for the actual work of excavation) 6. Camp staff (e. g. , camp managers, registrars, tool stewards) 7. General public (e. g. , educators, learners, citizens) 28

ETANA Societies • Social issues 1. Who owns the finds? 2. Where should they

ETANA Societies • Social issues 1. Who owns the finds? 2. Where should they be preserved? 3. What nationality and ethnicity do they represent? 4. Who has publication rights? 5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this? 29

ETANA Scenarios 1. 2. 3. 4. Life in the site in former times Digital

ETANA Scenarios 1. 2. 3. 4. Life in the site in former times Digital recording: the planning stage and the excavation stage Planning stage: remote sensing, fieldwalking, field surveys, building surveys, consulting historical and other documentary sources, and managing the sites and monuments Excavation 1. 2. 3. 4. 5. 6. 7. 8. Detailed information is recorded, including for each layer of soil, and for features such as pole holes, pits, and ditches. Data about each artifact is recorded together with information about its exact find spot. Numerous environmental and other samples are taken for laboratory analysis, and the location and purpose of each is carefully recorded. Large numbers of photographs are taken, both general views of the progress of excavation and detailed shots showing the contexts of finds. Organization and storage of material Analysis and hypotheses generation and testing Publications, museum displays Information services for the general public 30

ETANA Spaces 1. Geographic distribution of found artifacts 2. Temporal dimension (as inferred by

ETANA Spaces 1. Geographic distribution of found artifacts 2. Temporal dimension (as inferred by archaeologists) 3. Metric or vector spaces 1. used to support retrieval operations, and to calculate distance (and similarity) 2. used to browse / constrain searches spatially 4. 3 D models of the past, used to reconstruct and visualize archaeological ruins 5. 2 D interfaces for human-computer interaction 31

ETANA Structures 1. Site Organization 1. Region, site, partition, sub-partition, locus, … 2. Temporal

ETANA Structures 1. Site Organization 1. Region, site, partition, sub-partition, locus, … 2. Temporal orderings (ages, periods) 3. Taxonomies 1. for bones, seeds, building materials, … 4. Stratigraphic relationships 1. above, beneath, coexistent 32

ETANA Streams 1. successive photos and drawings of excavation sites, loci, unearthed artifacts 2.

ETANA Streams 1. successive photos and drawings of excavation sites, loci, unearthed artifacts 2. audio and video recordings of excavation activities and discussions 3. textual reports 4. 3 D models used to reconstruct and visualize archaeological ruins. 33

5 S and DL formal definitions and compositions (April 2004 TOIS) 34

5 S and DL formal definitions and compositions (April 2004 TOIS) 34

Fox & Gonçalves Book Outline • Ch. 1. Introduction (Motivation, Synopsis) • • Part

Fox & Gonçalves Book Outline • Ch. 1. Introduction (Motivation, Synopsis) • • Part 1 – The “Ss” Part 2 – Higher DL Constructs Part 3 – Advanced Topics Appendix 35

Book Parts and Chapters - 1 • Ch. 1. Introduction (Motivation, Synopsis) • Part

Book Parts and Chapters - 1 • Ch. 1. Introduction (Motivation, Synopsis) • Part 1 – The “Ss” – Ch. 2: Streams – Ch. 3: Structures – Ch. 4: Spaces – Ch. 5: Scenarios – Ch. 6: Societies 36

Book Parts and Chapters - 2 • Part 2 – Higher DL Constructs –

Book Parts and Chapters - 2 • Part 2 – Higher DL Constructs – Ch. 7: Collections – Ch. 8: Catalogs – Ch. 9: Repositories and Archives – Ch. 10: Services – Ch. 11: Systems – Ch. 12: Case Studies 37

Book Parts and Chapters - 3 • Part 3 – Advanced Topics – Ch.

Book Parts and Chapters - 3 • Part 3 – Advanced Topics – Ch. 13: Quality – Ch. 14: Integration – Ch. 15: How to build a digital library – Ch. 16: Research Challenges, Future Perspectives • Appendix – A: Mathematical preliminaries – B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL – D: Formal Definitions: Archeological DL – E: Glossary of terms, mappings 38

Chapter 3: (Degree of) Structure Web DLs DBs Chaotic Organized Structured 39

Chapter 3: (Degree of) Structure Web DLs DBs Chaotic Organized Structured 39

Digital Objects (DOs) • Born digital • Digitized version of “real” object – Is

Digital Objects (DOs) • Born digital • Digitized version of “real” object – Is the DO version the same, better, or worse? – Decision for ETDs: structured + rendered • Surrogate for “real” object – Not covered explicitly in metamodel for a minimal DL – Crucial in metamodel for archaeology DL 40

Metadata: Complex to Simple + thesis MARC ($50) Dublin Core (DC) 41

Metadata: Complex to Simple + thesis MARC ($50) Dublin Core (DC) 41

Also Important: Epub, SGML, XML • 5 S perspective: streams, structures, scenarios • Authoring

Also Important: Epub, SGML, XML • 5 S perspective: streams, structures, scenarios • Authoring • Rendering, presenting • Tagging, Markup, DOM • Semi-structured information • Dual-publishing, e. Books • Styles (XSL, XSLT) • Structured queries 42

Chapter 4 Overview (Spaces) • Retrieval models – Boolean, extended Boolean – Vector, LSI

Chapter 4 Overview (Spaces) • Retrieval models – Boolean, extended Boolean – Vector, LSI – Probabilistic: classical, belief network, inference network, language models • User interfaces and visualization – cont’d 43

User interfaces and visualization • • 2 D interfaces 3 D interfaces GIS Other

User interfaces and visualization • • 2 D interfaces 3 D interfaces GIS Other paradigms: trees, graphs, bubbles, coordinated views, … • Stepping Stones and Pathways – http: //fox. cs. vt. edu/SSP/ 44

Chapter 6 Overview (Societies) • User communities – Authors, editors, teachers, students, readers –

Chapter 6 Overview (Societies) • User communities – Authors, editors, teachers, students, readers – Personal(ization), group(ware), community, global – Accessibility, universal access • Librarians: reference, acquisition, operations • Research community – Associations, conferences, publications, labs, projects • Economics – Copyright, intellectual property rights, digital rights management, authorization, authentication, security, privacy, self-archiving (eprints) – Publishers, catalogers, distributors, sustainability – Open source, commercial, hybrid 45

Chapter 9 Archives & Repositories • Open Archives Initiative (OAI) • Institutional Repositories •

Chapter 9 Archives & Repositories • Open Archives Initiative (OAI) • Institutional Repositories • Persistent storage of digital objects • Coupling of metadata with digital objects • Use of “handles” as identifiers for digital objects • Put, get, harvest 46

OAI - Open Archives Initiative • Advocacy for interoperability • Standard for transferring metadata

OAI - Open Archives Initiative • Advocacy for interoperability • Standard for transferring metadata among digital libraries – Protocol for Metadata Harvesting (PMH) • Simplicity • Generality • Extensibility • Support for PMH => Open Archive (OA) 47

OAI – Repository Perspective Required: Protocol MDO MDO DO DO 48

OAI – Repository Perspective Required: Protocol MDO MDO DO DO 48

OAI – Black Box Perspective OA 7 OA 4 OA 2 OA 1 OA

OAI – Black Box Perspective OA 7 OA 4 OA 2 OA 1 OA 3 OA 6 OA 5 49

Tiered Model of Interoperability Mediator services Metadata harvesting Document models 50

Tiered Model of Interoperability Mediator services Metadata harvesting Document models 50

Institutional Repositories - 1 • “Institutional repositories are digital collections that capture and preserve

Institutional Repositories - 1 • “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities. ” • Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D. C. , USA • www. arl. org/sparc/IR/IR_Guide_v 1. pdf 51

Institutional Repositories - 2 • “A university-based institutional repository is a set of services

Institutional Repositories - 2 • “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution. ” • Lynch, C. A. In ARL Bimonthly Report 226, pp. 1 -7, Feb. 2003, www. arl. org/newsltr/226/ir. html 52

What is a Digital Object Repository? Ø Also called: digital rep. , digital asset

What is a Digital Object Repository? Ø Also called: digital rep. , digital asset rep. , institutional repository Ø Stores and maintains digital objects (assets) Ø Provides external interface for Digital Objects ØCreation, Modification, Access Ø Enforces access policies Ø Provides for content type disseminations Adapted from Slide by V. Chachra, VTLS 53

Goals of Institutional Repositories (by Steven Harnad, U. Southampton) Ø Self Archiving of Institutional

Goals of Institutional Repositories (by Steven Harnad, U. Southampton) Ø Self Archiving of Institutional Research ØThesis and Dissertations (VTLS NDLTD Project) ØArticle preprints and post prints ØInternal documents and maps Ø Management of digital collections Ø Preservation of materials – decentralized approach Ø Housing of teaching materials Ø Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objects Adapted from Slide by V. Chachra, VTLS 54

Chapter 10 Services • • Taxonomy of services Ontology, composition, reuse Evaluation Key services

Chapter 10 Services • • Taxonomy of services Ontology, composition, reuse Evaluation Key services in-depth: – Crawling, indexing – Clustering, classifying – Recommending, using social networks – Logging 55

56

56

Ontology: Applications • Expand definition of minimal DL by characterizing – typical DL services

Ontology: Applications • Expand definition of minimal DL by characterizing – typical DL services – in the context of “employs” and “produces” relationships • Use characterization to: – Reason about how DL services can be built from other DL components – As well as be composed with other services through extension or reuse 57

58

58

Ontology: Applications 59

Ontology: Applications 59

60

60

5 S and Generating DLs • • • 5 S Framework 5 S definitions,

5 S and Generating DLs • • • 5 S Framework 5 S definitions, services taxonomy, ontology 5 SL (specification language) 5 SGraph (to prepare 5 SL) 5 SGen (for DL development, incl. DSpace) Schema. Mapper for development of union DL 61

62

62

Chapter 11 Systems: Architectural Issues • • • Independent system vs. part of federation

Chapter 11 Systems: Architectural Issues • • • Independent system vs. part of federation Centralized vs. distributed vs. open services Monolithic vs. modular vs. componentized Topologies: bus vs. star vs. hierarchical vs. network Decompositions vary – search engine, browser, DBMS, MM support – repository, handle server, client – information resources + mediators, bus or agent collection + client with workspace/environment 63

Also Important: Agents • 5 S perspective: societies, streams, spaces, scenarios, structures • Protocols:

Also Important: Agents • 5 S perspective: societies, streams, spaces, scenarios, structures • Protocols: light-weight • Knowledge interchange: mediators, wrappers • Negotiation, registries • Distributed issues • Webbots (automatic indexing) • Ontologies (standard upper) 64

Fedora™ Digital Object Architecture Persistent ID (PID) Globally unique persistent id Public view: access

Fedora™ Digital Object Architecture Persistent ID (PID) Globally unique persistent id Public view: access methods for obtaining “disseminations” of digital object content Disseminators Internal view: metadata necessary to manage the object System Metadata Datastreams EAD, TEI, DC, MARC, VRA Core, MIX, etc. Images, E-books, E-journals, Music, Video, etc. Protected view: content that makes up the “basis” of the object The Mellon Fedora Project Adapted from Slide by V. Chachra, VTLS 65

Example Disseminators Persistent ID (PID) Disseminators Default Get Profile List Items Get Item List

Example Disseminators Persistent ID (PID) Disseminators Default Get Profile List Items Get Item List Methods Get DC Record Simple Image System Metadata Datastreams Get Thumbnail Get Medium Get High Get Very. High 66

Fedora™ Repository Web Service Exposure Layer Adapted from Slide by V. Chachra, VTLS 67

Fedora™ Repository Web Service Exposure Layer Adapted from Slide by V. Chachra, VTLS 67

5 SL: a DL design language • Domain specific languages – Address a particular

5 SL: a DL design language • Domain specific languages – Address a particular class of problems by offering specific abstractions and notations for the domain at hand – Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping. • XML-based realization of 5 S – Interoperability – Use of many sub-languages (e. g. , MIME types, XML Schemas, UML notations) 68

5 SGraph: A DL Modeling Tool • • • Help users model their own

5 SGraph: A DL Modeling Tool • • • Help users model their own instances of a digital library (DL) in the 5 S language (5 SL). A simple modeling process which enables rapid generation of digital libraries Features – – – 5 SGraph loads and displays a metamodel in a structured toolbox. The structured editor of 5 SGraph provides a topdown visual building environment for the DL designer. 5 SGraph produces syntactically correct 5 SL files according to the visual model built by the designer. 69

Overview of 5 SGraph Workspace (instance model) Structured toolbox (metamodel) 70

Overview of 5 SGraph Workspace (instance model) Structured toolbox (metamodel) 70

71

71

72

72

73

73

74

74

5 SGen • Version 1 – MARIAN as the target system – Focused on

5 SGen • Version 1 – MARIAN as the target system – Focused on rich structures: semantic networks – Behavior attached to nodes/links • Version 2 – Shifted for later work to componentized (ODL) approach – Focused on scenarios/societies – Structures/Spaces encapsulated within components (e. g. , relational tables, indexes) – Only textual streams supported • Version 3 – Into DSpace (practical DL) 75

5 SLGen – Version 2: ODL, Services, Scenarios 76

5 SLGen – Version 2: ODL, Services, Scenarios 76

Tools/Applications 77

Tools/Applications 77

Arch. DL Expert 5 S Archaeology Meta. Model Arch. DL Designer 5 SGraph VN

Arch. DL Expert 5 S Archaeology Meta. Model Arch. DL Designer 5 SGraph VN Metadata Format Scenario Sub-model ETANA-DL Metadata Format VN Catalog Mapping Tool Wrapper 4 VN Wrapper 4 HD Structure Inverted Files. Sub-model Search Service ex Harvesting Mapping Searching Browsing … x de In 5 SGen XOAI Browse DB Component Pool Browsing … HD Catalog Services DB Browse Service Other XOAI ETANA-DL Services Web Interface Union Catalog In d ETANA-DL Union Services Descriptions HD Metadata Format 78

Ch. 12 Case Studies: CS -> CSTC • NSF and ACM Education Committee funded

Ch. 12 Case Studies: CS -> CSTC • NSF and ACM Education Committee funded a 2 year project “A Computer Science Teaching Center” - CSTC http: //www. cstc. org/ • College of NJ, U. Ill. Springfield, Virginia Tech • Focus initially on labs, visualization, multimedia • Multimedia part supported by a 2 nd grant to Virginia Tech and The George Washington University (with curricular guidelines)

CS Teaching Center (CSTC) • Instead of building large, expensive multimedia packages, that become

CS Teaching Center (CSTC) • Instead of building large, expensive multimedia packages, that become obsolete and are difficult to reuse, concentrate on small knowledge units. • Learners benefit from having well-crafted modules that have been reviewed and tested. • Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. • ACM support led to Journal of Educational Resources in Computing (JERIC): completed 2 co-EIC terms

81

81

Browsing (2) 82

Browsing (2) 82

83

83

84

84

Computing and Information Technology Interactive Digital Educational Library (CITIDEL) • Domain: computing / information

Computing and Information Technology Interactive Digital Educational Library (CITIDEL) • Domain: computing / information technology • Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, Cite. Seer), Planet. Math. org, NCSTRL (technical reports), … • Submission & Collection: sub/partner collections www. citidel. org 85

Overview of CITIDEL architecture 86

Overview of CITIDEL architecture 86

Distributed repository structure 87

Distributed repository structure 87

Digital library architecture for local and interoperable CITIDEL services 88

Digital library architecture for local and interoperable CITIDEL services 88

89

89

90

90

91

91

92

92

93

93

CITIDEL -> NSDL • A collection project in the • National STEM (science, technolgy,

CITIDEL -> NSDL • A collection project in the • National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL • National Science Digital Library • www. nsdl. org • (Next slides courtesy Lee Zia, NSF)

Connects: Users: students, educators, life-long learners Content: structured learning materials; large real-time or archived

Connects: Users: students, educators, life-long learners Content: structured learning materials; large real-time or archived datasets; audio, images, animations; primary sources; digital learning objects (e. g. applets); interactive (virtual, remote) laboratories; . . . Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate; . . . 95

Enables: Environments for • Discovery • Communication • Stability • Collaboration • Reliability •

Enables: Environments for • Discovery • Communication • Stability • Collaboration • Reliability • Creation • Reusability • Validation AND • Interoperability • Evaluation • Customizability • Recognition • . . . of Resources 96

Collections • • Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based

Collections • • Discovery of content Classification and cataloguing Acquisition and/or linking; referencing Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged • Access to massive real-time or archived datasets • Software tool suites for analysis, modeling, simulation, or visualization • Reviewed commentary on learning materials and pedagogy 97

Services • Help services, frequently asked questions, etc. • Synchronous/asynchronous collaborative learning environments using

Services • Help services, frequently asked questions, etc. • Synchronous/asynchronous collaborative learning environments using shared resources • Mechanisms for building personal annotated digital information spaces • Reliability testing for applets or other digital learning objects • Audio, image, and video search capability • Metadata system translation • Community feedback mechanisms 98

99

99

100

100

101

101

NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup Portals & Clients

NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup Portals & Clients NSDL Collections referenced items&& Special items collections Databases collections User Interfaces Core NSDL “Bus” Collection Building Core Services: Collectionmetadata Building Core gathering Collection. Services protocols Building Services harvesting NSDL Services Other NSDL Services Usage Enhancement Core Services: CI Services information retrieval CI Services browsing CI Services authentication CI Services personalization CI Services discussion annotation 102

A Digital Library Case Study • Domain: graduate education, research • Genre: ETDs=electronic theses

A Digital Library Case Study • Domain: graduate education, research • Genre: ETDs=electronic theses & dissertations • Submission: ETD-db, DSpace, Proquest, … • Collection: local archives, regional collaborations, global union catalog Project: Networked Digital Library of Theses & Dissertations (NDLTD) www. ndltd. org

104

104

Student Gets Committee Signatures and Submits ETD Signed Grad School

Student Gets Committee Signatures and Submits ETD Signed Grad School

What are we doing? • Aiding universities to enhance graduate education, publishing and IPR

What are we doing? • Aiding universities to enhance graduate education, publishing and IPR efforts • Helping improve the availability and content of theses and dissertations • Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i. e. , are Information Literate and can be more expressive)

Why ETD? Short Answer • For Students: – Gain knowledge and skills for the

Why ETD? Short Answer • For Students: – Gain knowledge and skills for the Information Age – Richer communication (digital information, multimedia, …) • For Universities: – Easy way to enter the digital library field and benefit thereby • For the World: – Global digital library – large, useful, many services • General: – Save time and money – Increased visibility for all associated with research results 107

Metamodels in the 5 S Framework • • Modeling archaeological information systems using the

Metamodels in the 5 S Framework • • Modeling archaeological information systems using the 5 S theory to better understand the domain and design the system and the supported services Minimal DL Minimal Arch. DL … 108

A Minimal DL in the 5 S Framework Streams Structured Stream Structures Spaces Structural

A Minimal DL in the 5 S Framework Streams Structured Stream Structures Spaces Structural Metadata Specification Scenarios Societies services Descriptive Metadata Specification indexing browsing searching hypertext Digital Object Collection Metadata Catalog Repository Minimal DL 109

A Minimal Arch. DL in the 5 S Framework Streams Structured Stream Spaces Descriptive

A Minimal Arch. DL in the 5 S Framework Streams Structured Stream Spaces Descriptive Metadata specification Scenarios Societies services Spa. Tem. Org Stra. Dia Arch. Obj Arch Descriptive Metadata specification indexing browsing searching hypertext Arch. DO Arch Metadata catalog Arch. Coll Arch. DR Minimal Arch. DL 110

Moving from a minimal DL towards a DL reference model (1/2) Knowledge DL quality

Moving from a minimal DL towards a DL reference model (1/2) Knowledge DL quality management Annotation Multimedia Minimal DL PIM Practical DL systems Domainspecific DLs DL reference model 111

Moving from a minimal DL towards a DL reference model (2/2) • Content-based image

Moving from a minimal DL towards a DL reference model (2/2) • Content-based image retrieval services in a DL • A superimposedinformation-supported DL • Practical DL generation 112

Superimposing information Superimposed layer New information/structures Mark Reference to base information element Base layer

Superimposing information Superimposed layer New information/structures Mark Reference to base information element Base layer Existing information from heterogeneous sources: text, images, audio/video documents 113

Preliminary SI-DL metamodel 114

Preliminary SI-DL metamodel 114

Minimal CBIR DL Stream Image Stream Space Feature Vector Image Descriptor Composite Descriptor Structure

Minimal CBIR DL Stream Image Stream Space Feature Vector Image Descriptor Composite Descriptor Structure Service Society KNNQ User Info Need Structured Featute Vector Image Content Description Image Object RQ Visualization Operation Image Digital Object Image Descriptor Metadata Catalog Image Collection Content-based Image Searching Service 115

Summary • 5 S and Generating DLs – – – – 5 S Framework

Summary • 5 S and Generating DLs – – – – 5 S Framework 5 S definitions, services taxonomy, ontology 5 SL 5 SGraph 5 SGen (and DL development) DL development of union DL 5 SGen into DSpace • 5 S Metamodels – – – Minimal DL Archaeology DL Multimedia (CBIR) DL Union DL Practical DL, superimposed information, personal DL, … 116

117

117

People • • Digital librarians DL system developers DL system administrators DL managers DL

People • • Digital librarians DL system developers DL system administrators DL managers DL collection development staff DL evaluators DL users 118

119

119

Living In the Knowl. Edge Society (LIKES) Grant: NSF 06 -608, CPATH Proposal: for

Living In the Knowl. Edge Society (LIKES) Grant: NSF 06 -608, CPATH Proposal: for VT Pathways (themed version of core curric. ) PI: Edward A. Fox 120

Purpose • Graduates from colleges & universities should be prepared to live in and

Purpose • Graduates from colleges & universities should be prepared to live in and contribute to the Knowledge Society emerging in the 21 st century. • Computing/LIS education can be revitalized: • if the LIKES theme spreads in programs (so graduates can help build the Knowledge Society); • if faculty collaborate (both in education and research endeavors) with colleagues globally who are interested in LIKES. 121

Living In the Knowl. Edge Society (LIKES): Core surrounded by enabling concepts, problem providing

Living In the Knowl. Edge Society (LIKES): Core surrounded by enabling concepts, problem providing disciplines 122

Objectives – 1 of 3 • Enhance education in the discipline: – New courses:

Objectives – 1 of 3 • Enhance education in the discipline: – New courses: Living in the Global Knowledge Society, Knowledge Management – Enhanced courses to be more driven by the LIKES theme: Artificial Intelligence, Data Mining, Digital Libraries, Multimedia/Hypertext/Information Access, … 123

Objectives – 2 of 3 • Give special attention, inside the discipline and across

Objectives – 2 of 3 • Give special attention, inside the discipline and across disciplines: • to the areas of data, information, and knowledge; • to key concepts and methods, such as: representation/vie ws inference/decision s complexity/heurist ics integration/mappi search/discovery comparison/matc hing analysis/mining modeling/simulati 124

Objectives – 3 of 3 • Engage researchers and teachers and students in the

Objectives – 3 of 3 • Engage researchers and teachers and students in the Knowledge Society’s problems, as motivation, orientation, and to help with solutions, e. g. , – Shifting toward digital government, including statutes, rules, regulations, and procedures; – Handling attacks, including spam and viruses; – Ensuring quality even with disinformation, through knowledge sourcing, provenance, and sharing of community expertise; – Ensuring changes through education, that is crossdisciplinary, globally contextualized, based on awareness of human development, learning theory, and cognitive psychology 125

Potential Course Areas/Courses • Personal Knowledge Management – Computer Science and Information Systems, e.

Potential Course Areas/Courses • Personal Knowledge Management – Computer Science and Information Systems, e. g. , multi-media, process design and evaluation, and Human-Computer / Human-Information interaction. – Psychology, e. g. , knowledge organization principles, human cognitive processes. – Industrial Systems Engineering, e. g. , Ergonomic factors of knowledge environments. – Ethics, e. g. , ethical issues of information disclosure. • Communication and Collaboration – Communications, e. g. , Communication using digital visualizations, using knowledge access in constructing digital messages. – Information Systems and Computer Science, e. g. , computer supported cooperative work and group support systems. – Marketing, e. g. , influence of knowledge presentation on on-line customer behavior. • Organization – Information Systems, e. g. , service innovation and development, system design and development. – Management Science, e. g. , decision support systems concepts, capabilities, techniques, and tools. – Management, Marketing, Accounting, and Finance, e. g. , business in the information age. • Society – Sociology, e. g. , impact of knowledge differentials across society and countries. – Political Science, e. g. , governmental collection and use of knowledge, impact of technology on elections and government. 126

DL Curriculum Project (NSF supporting VT, UNC-CH) • Identify, develop and test educational DL

DL Curriculum Project (NSF supporting VT, UNC-CH) • Identify, develop and test educational DL modules, guided by - Experts, international collaborators - Computing Curriculum 2001 - 5 S framework - Analysis of DL course syllabi … 127

CC 2001 Information Management Areas IM 1. Information models and systems* IM 2. Database

CC 2001 Information Management Areas IM 1. Information models and systems* IM 2. Database systems* IM 8. Distributed DBs IM 3. Data modeling* IM 10. Data mining IM 4. Relational DBs IM 11. Information storage and retrieval IM 9. Physical DB design IM 5. Database query languages IM 12. Hypertext and hypermedia IM 6. Relational DB design IM 13. Multimedia information & systems IM 7. Transaction processing IM 14. Digital libraries 128

Why Modular Design • Flexibility, e. g. , for ETD programs: – Self-study by

Why Modular Design • Flexibility, e. g. , for ETD programs: – Self-study by NDLTD trainers – Self-study by ETD authors – Short courses by NDLTD trainers of ETD authors – A course based on a single module – Course sequence (program) from multiple modules – Plug in modules into an existing course (enhancement) • Module 1. Overview + Module 10. DL Education & Research 129

Modules 1. 2. 3. 4. 5. 6. 7. Collection Development Digital objects / Composites

Modules 1. 2. 3. 4. 5. 6. 7. Collection Development Digital objects / Composites / Packages Metadata, Cataloging, Author submission Architecture, Interoperability Data visualization Services Intellectual property rights management, Privacy, Protection 8. Social issues / Future of DLs 9. Archiving and Preservation 130

Ascertaining Priority Topics • We’ve manually classified analyzed publications using 9 Modules: Source Count

Ascertaining Priority Topics • We’ve manually classified analyzed publications using 9 Modules: Source Count Proceedings JCDL ’ 01 – ’ 05 354 Proceedings ACM DL ’ 96 – ’ 00 189 Magazine articles D-Lib ’ 95 – ‘ 06 521 Session titles JCDL, ACM DL, ECDL 264 131

Conference papers x modules 132

Conference papers x modules 132

 • Analysis Results: - Total of 543 proceedings: Most popular topics were architecture

• Analysis Results: - Total of 543 proceedings: Most popular topics were architecture (module 4) and services (module 6) 133

Distribution of D-Lib Magazine Articles across Module Topics 134

Distribution of D-Lib Magazine Articles across Module Topics 134

 • Analysis Results: - Total of 521 articles: Most popular topics were architecture

• Analysis Results: - Total of 521 articles: Most popular topics were architecture (module 4), services (module 6) and social issues (module 8) 135

Distribution of Session Titles across Module Topics 136

Distribution of Session Titles across Module Topics 136

 • Analysis Results: - Total of 264 session titles (JCDL, ECDL, ICADL): Most

• Analysis Results: - Total of 264 session titles (JCDL, ECDL, ICADL): Most popular topic was services (module 6) followed by architecture (module 4) 137

Fox & Gonçalves Book Outline • Ch. 1. Introduction (Motivation, Synopsis) • Part 1

Fox & Gonçalves Book Outline • Ch. 1. Introduction (Motivation, Synopsis) • Part 1 – The “Ss” – Ch. 2: Streams – Ch. 3: Structures – Ch. 4: Spaces – Ch. 5: Scenarios – Ch. 6: Societies 138

Textbook Outline (2) • Part 2 – Higher DL Constructs – Ch. 7: Collections

Textbook Outline (2) • Part 2 – Higher DL Constructs – Ch. 7: Collections – Ch. 8: Catalogs – Ch. 9: Repositories and Archives – Ch. 10: Services – Ch. 11: Systems – Ch. 12: Case Studies 139

Textbook Outline (3) • Part 3 – Advanced Topics – Ch. 13: Quality –

Textbook Outline (3) • Part 3 – Advanced Topics – Ch. 13: Quality – Ch. 14: Integration – Ch. 15: How to build a digital library – Ch. 16: Research Challenges, Future Perspectives • Appendix – A: Mathematical preliminaries – B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL – D: Formal Definitions: Archeological DL – E: Glossary of terms, mappings 140

Pointers and Summary • http: //fox. cs. vt. edu/talks • www. dlib. vt. edu

Pointers and Summary • http: //fox. cs. vt. edu/talks • www. dlib. vt. edu • fox@vt. edu • IR -> DL • Education: CSTC, CITIDEL, NSDL, NDLTD, LIKES, DLcurric 141