Digital libraries Challenges for evaluation Tefko Saracevic Rutgers

Digital libraries: Challenges for evaluation Tefko Saracevic Rutgers University © Tefko Saracevic, Rutgers University 1

Evaluation: what is? z. Questions about performance ytesting, validating, comparing, appraising z. Many approaches & types - making a choice z. In systems approach: y. Effectiveness: how well does a system, or part, perform that for which it was designed? y. Efficiency: at what cost? $$$, time, effort z. Gains insight into behavior & organization z Always there, willing or not © Tefko Saracevic, Rutgers University 2

State of evaluation of digital libraries z. Many projects, some talk & discussion ybut no evaluation to speak of z. Not high on anybody's agenda z. Related work on metrics proceeding y. D-Lib Working Group on Digital Library Metrics (an informal, non-funded group) x. Progress to date: a number of internal discussion papers; overall definitions proposed xsome criteria & scenarios suggested © Tefko Saracevic, Rutgers University 3

In research z. Dlib Intiative 1 (1995 -1998) xsix projects xevaluation talked about around 1995 -6, but only some evaluation performed in projects xproject results as a whole not evaluated • what did they actually accomplish? ? z. Dlib Initiative 2 (1999 - ) x 21 projects + 3 in undergrad education x 6 (of 21) mention some evaluation, but no details at all. Evaluation not a even a minor component xundergrad projects: one evaluation © Tefko Saracevic, Rutgers University 4

Research … lingering questions z. What, if anything, is meant by evaluation in DLI projects? In dlib research in general? z. Is evaluation considered necessary at all? x. Why is no attention paid to evaluation? x. Is just something that computes enough for evaluation? Or anecdotes about reactions? x. Is this a new kind of science? Or development? z. What of public, overall evaluation? x. What of refereed publications? Where are they? © Tefko Saracevic, Rutgers University 5

In practice z. Many dlibs built and operating ynot one evaluated, but improvements made z. Publishers built dlibs y e. g Elsevier had use and economic evaluation z. Professional societies have dlibs yno evaluation, but improvements made z. Evaluation approaches: xinternal discussion, observation, experience, copying © Tefko Saracevic, Rutgers ximprovements, redesigns follow University 6

Needed and lacking z. Overall conceptual framework y. Construct - objects, elements - to be evaluated x. What is actually meant by a digital library? What is encompassed? What elements to take? What is critical? z. Evaluation approach y. Context - level - of evaluation x. What is “evaluation” in dlib context? What approach to use? On what to concentrate? © Tefko Saracevic, Rutgers University 7

Needed … more z. Criteria for evaluation x. What to evaluate in that context? What to reflect? What parameters, metrics to select for evaluation? z. Measures x. What measures to apply to various criteria? What metrics can be translated into measures? z. Methods x. How to evaluate? What procedures to use? © Tefko Saracevic, Rutgers University 8

Required z. These are essential requirements for any evaluation xconstruct, context, criteria, measures, method z. No specification on each - no evaluation z. Here we talk about first three © Tefko Saracevic, Rutgers University 9

Construct: What is meant by a dlib? z. Two conceptualizations stressing: 1. distributed objects in various forms, distributed access, representation, operability (computer science) 2. institution, collection, services, availability (libraries) z. First is research perspective xfocus on a range of research problems, with little or no operations; “dlib” very broadly interpreted z. Second is library operational perspective xfocus on practical problems of transforming library institutions and services, with little or no research; “dlib” very specifically interpreted © Tefko Saracevic, Rutgers University 10

Research perspective "Digital libraries are organized collections of digital information. They combine the structuring and gathering of information, which libraries and archives have always done, with the digital representation that computers have made possible. ” Lesk, 1997 (evaluation constructs or elements are in bold) © Tefko Saracevic, Rutgers University 11

Library conception “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities. ” © Tefko Saracevic, Rutgers University Digital Libraries Federation (DLF) 12

Constructs/elements for evaluation z. Digital collection(s), resources y. Selection, gathering y. Distribution, connections y. Organization, structure (physical & intellectual) y. Representation, interpretation z. Access y. Intellectual, physical y. Distribution y. Interfaces © Tefko Saracevic, Rutgers University 13

constructs. . . more z. Services y. Availability y. Dissemination, delivery z. Preservation, persistence z. Security, privacy, policy, legality z. Users, use, communities z. Management, economics z. Integration © Tefko Saracevic, Rutgers University 14

Context - general z. Any evaluation is a tuplet xbetween a selected element to be evaluated and a selected type of its performance z. Leads to selection of a level of evaluation x. What to concentrate on? What level of performance? z. Use-centered & system-centered levels z. Dlib performance can be viewed from a number of standpoints or levels x. What are they? © Tefko Saracevic, Rutgers University 15

Context use-centered levels z. Social: y. How well does a dlib support inf. demands, needs & roles of society, community? xhardest to evaluate z. Institutional: y. How well does a dlib support institutional, organizational mission & objectives? How well does it integrate with other resources? xtied to objectives of institution, organization xalso hard to evaluate © Tefko Saracevic, Rutgers University 16

use levels … more z. Individual: y. How well does a dlib support inf. needs & activities of people? xmost evaluations of many systems in this context xuse of various aspects, contents, features by users xtask performance © Tefko Saracevic, Rutgers University 17

Context system-centered levels z. Interface y. How well does a given interface provide access? z. Engineering y. How well does hardware, networks, configurations perform? © Tefko Saracevic, Rutgers University 18

system levels … more z. Processing: y. How well do procedures, techniques, operations, algorithms … work? z. Content y. How well is the collection selected, organized, structured, represented? © Tefko Saracevic, Rutgers University 19

Levels of evaluation ED Social nf. of i SY STE M Processing Use CE NT ER Engineering Content ER NT CE ED INTERFACE AT AL U EV Individual US ER IO NL EV EL S Institutional © Tefko Saracevic, Rutgers University 20

Criteria z. For each level criteria have to determined z. Traditional library criteria: ycollection xpurpose, scope, authority, coverage, currency, audience, cost, format, treatment, preservation. . . yinformation xaccuracy, appropriateness, links, representation, uniqueness, comparability, presentation … yuse xaccessibility, availability, searchability, usability. . . © Tefko Saracevic, Rutgers University 21

criteria … more z. Traditional human-computer interaction criteria: yusability, functionality, effort level xscreen, terminology & system feedback, learning factors, system capabilities ytask appropriateness; failure analysis z. Traditional retrieval criteria: yrelevance: precision, recall measures ysatisfaction, success, overall value © Tefko Saracevic, Rutgers University 22

criteria … more z. Value study criteria - value-in-use yvalues users assign to dlib use xassessment by users on qualities of interaction with a dlib service & worth or benefits of results of interaction with the dlib as related to reasons for using it xmultidimensional - composite of 1. Reasons for use 2. Interaction with a dlib service 3. Results or impacts of use © Tefko Saracevic, Rutgers University 23

Adaptation z. Traditional criteria have to be adopted to dlibs & expanded xto include unique characteristics of dlibs z. Criteria for research results evaluation have to include some of these, plus: ytraditional measures of research & design evaluation from systems approach & computer science, yand science in general - peer evaluation © Tefko Saracevic, Rutgers University 24

Conclusions z. Investment in dlibs very high & growing z. So far investment in evaluation very small y. How do we know what is accomplished? y. What works, what does not? y. What mistakes, practices not to repeat? z. Evaluation of dlibs very complex y. Needs own methodological investigation y. Metrics work very important. Funding? © Tefko Saracevic, Rutgers University 25

conclusions … more z. Critical questions, not yet raised: y. How can dlib efforts proceed without evaluation? y. What are the consequences? © Tefko Saracevic, Rutgers University 26