Information Science Where does it come from and

























































- Slides: 57
Information Science: Where does it come from and where is it going? Tefko Saracevic, Ph. D School of Communication, Information and Library Studies Rutgers University New Brunswick, New Jersey USA http: //www. scils. rutgers. edu/~tefko Gutenberg 1397 -1468 © Tefko Saracevic 1
Information science: a short definition “the collection, classification, storage, retrieval, and dissemination of recorded knowledge treated both as a pure and as an applied science” Merriam-Webster © Tefko Saracevic 2
Organization of presentation 1. 2. 3. 4. 5. 6. 7. 8. 9. Big picture – problems, solutions, social place Structure – main areas in research & practice Technology – information retrieval – largest part Information – representation; bibliometrics People – users, use, seeking, context Paradigm split – distancing of areas Relations – librarianship, computer science Digital libraries – whose are they anyhow? Conclusions – big questions for the future © Tefko Saracevic 3
Part 1. The big picture Problems addressed Ø Bit of history: Vannevar Bush (1945): ¤Defined problem as “. . . the massive task of making more accessible of a bewildering store of knowledge. ” ¤Problem still with us & growing 1890 -1974 © Tefko Saracevic 4
… solution Ø Bush suggested a machine: “Memex. . . association of ideas. . . duplicate mental processes artificially. ” Ø Technological fix to problem Ø Still with us: technological determinant © Tefko Saracevic 5
At the base of information science: Problem Trying to control content in Ø Information explosion ¤exponential growth of information artifacts, if not of information itself PLUS today Ø Communication explosion ¤exponential growth of means and ways by which information is communicated, transmitted, accesses, used © Tefko Saracevic 6
technological solution, BUT … applying technology to solving problems of effective use of information BUT: from a HUMAN & SOCIAL and not only TECHNOLOGICAL perspective © Tefko Saracevic 7
or a symbolic model People Information Technology © Tefko Saracevic 8
Problems & solutions: SOCIAL CONTEXT Ø Professional practice AND scientific inquiry related to: Effective communication of knowledge records - ‘literature’ - among humans in the context of social, organizational, & individual need for and use of information Ø Taking advantage of modern information technology © Tefko Saracevic 9
or as White & Mc. Caine (1998) put it: “modeling the world of publications with a practical goal of being able to deliver their content to inquirers [users] on demand. ” © Tefko Saracevic 10
General characteristics Ø Interdisciplinarity - relations with a number of fields, some more or less predominant Ø Technological imperative - driving force, as in many modern fields Ø Information society - social context and role in evolution - shared with many fields Table of content © Tefko Saracevic 11
Part 2. Structure Composition of the field Ø As many fields, information science has different areas of concentration & specialization Ø They change, evolve over time ¤grow closer, grow apart ¤ignore each other, less or more ¤sometimes fight © Tefko Saracevic 12
most importantly different areas… Ø receive more or less in funding & emphasis ¤producing great imbalances in work & progress ¤attracting different audiences & fields Ø this includes ¤vastly different levels of support for research and ¤huge commercial investments & applications © Tefko Saracevic 13
How to view structure? by decomposing areas & efforts in research & practice emphasizing Technology or Informatio n © Tefko Saracevic People or Table of content 14
Part 3. Technology Ø Identified with information retrieval (IR) ¤by far biggest effort and investment ¤international & global ¤commercial interest large & growing © Tefko Saracevic 15
Information Retrieval – definition & objective “ IR: . . . intellectual aspects of description of information, . . . search, . . . & systems, machines. . . ” Calvin Mooers, 1951 Ø How to provide users with relevant information effectively? For that objective: 1. How to organize information intellectually? 2. How to specify the search & interaction intellectually? 3. What techniques & systems to use effectively? 1919 -1994 © Tefko Saracevic 16
Streams in IR Res. & Dev. 1. Information science: Services, users, use; ¤ Human-computer interaction; ¤ Cognitive aspects ¤ 2. Computer science: ¤ Algorithms, techniques ¤ Systems aspects; evaluation 3. Information industry: Products, services, Web ¤ search engines – BIG! ¤ Market aspects Problem: ¤ ¤ relative isolation – discussed later © Tefko Saracevic 17
IR research Ø Started in the US through government support & in information science Ø Now mostly done within computer science ¤ e. g Special Interest Group on IR, Association for Computing Machinery (SIGIR, ACM) © Tefko Saracevic Gerard Salton 1927 -1995 18
Contemporary IR research Ø Spread globally ¤e. g. major IR research communities emerged in China, Korea, Singapore Ø Branched outside of information science “everybody does information retrieval” ¤search engines, data mining, natural language processing, artificial intelligence, computer graphics … © Tefko Saracevic 19
Testing in IR Ø Major component of IR made it strong & affected innovation Ø Long history – started with Cranfield tests in late 1950’s Ø Measures – precision & recall based on relevance Cyril Cleverdon 1914 -1997 © Tefko Saracevic 20
Text REtrieval Conference (TREC) Ø Major research, laboratory effort Ø Started in 1992, ¤ “support research within the IR community by providing the infrastructure necessary for large-scale evaluation” Ø Methods ¤ provides large test beds, queries, relevance judgments, comparative analyses ¤ essentially using Cranfield 1960’s methodology ¤ organized around tracks ¥ various topics – changing over years © Tefko Saracevic 21
TREC impact Ø International – big impact on creating research communities Ø Annual conferences ¤ reports, exchange results, foster cooperation Ø Results ¤ mostly in reports, available at http: //trec. nist. gov/pubs. html ¤ overviews provided as well ¤ but, only a fraction published in journals ¤ Book (2005): ¥TREC: Experiment and Evaluation in Information Retrieval Edited by Ellen M. Voorhees and Donna K. Harman © Tefko Saracevic 22
TREC tracks 116 groups from 20 countries Ø Ø Ø Ø Genomics Spam Blog Question answering Enterprise Million query (new) Legal © Tefko Saracevic Ø Previous tracks: ¤ ¤ ¤ ad-hoc (1992 -1999) routing (92– 97) interactive (94 -02) filtering (95 -02) cross language (97 -02) speech (97 -00) Spanish (94 -96) video (00 -01) Chinese (96 -97) query (98 -00) and a few more run for two years only 23
Broadening of IR – sample ever changing, ever new areas added Ø Ø Ø Cross language IR (CLIR) Natural language processing (NLP IR) Music IR (MIR) Image, video, multimedia retrieval Spoken language retrieval IR for bioinformatics and genomics Summarization; text extraction Question answering Many human-computer interactions XML IR Web IR; Web search engines IR in context – big area for major search engines & newer research © Tefko Saracevic 24
Commercial IR Ø Search engines based on IR Ø But added many elaborations & significant innovations ¤dealing with HUGE number of pages fast ¤countering spamming & page rank games – adversarial IR - combat of algorithms ¤adding context for searching Ø Spread & impact worldwide ¤about 2000 engines in over 160 countries ¤English was dominant, but not any more © Tefko Saracevic 25
Commercial IR: brave new world Ø Large investments & economic sector ¤hope for big profits, as yet questionable Ø Leading to proprietary, secret IR ¤also aggressive hiring of best talent ¤new commercial research centers in different countries (e. g. MS in China) Ø Academic research funding is changing ¤brain drain from academe Ø Commercial search engines facing many challenges – hiring best talent ¤ and providing brain-drain for academics © Tefko Saracevic 26
IR successfully effected: Ø Emergence & growth of the INFORMATION INDUSTRY Ø Evolution of IS as a PROFESSION & SCIENCE Ø Many APPLICATIONS in many fields ¤ including on the Web – search engines Ø Improvements in HUMAN - COMPUTER INTERACTION Ø Evolution of INTEDISCIPLINARITY IR has a long, proud history © Tefko Saracevic Table of content 27
Part 4. Information Ø Several areas of investigation; ¤as basic phenomenon – not much progress ¥measures as Shannon's not successful ¥concentrated on manifestations and effects ¥no recent progress in this basic research ¤information representation ¥large area connected with IR, librarianship ¥metadata ¤bibliometrics ¥structures of literature © Tefko Saracevic 28
What is information? Intuitively well understood, but formally not well stated ¤Several viewpoints, models emerged Ø Shannon: source-channel-destination ¤signals not content – not really applicable, despite many tries Ø Cognitive: changes in cognitive structures ¤content processing & effects Ø Social: context, situation ¤information seeking, tasks © Tefko Saracevic 29
Information in information science: Three senses (from narrowest to broadest) 1. Information in terms of decision involving little or no cognitive processing ¤ 2. Information involving cognitive processing & understanding ¤ 3. signals, bits, straightforward data - e. g. . inf. theory (Shanon), economics, understanding, matching texts, Brookes Information also as related to context, situation, problemat-hand ¤ USERS, USE, TASK For information science (including information retrieval): third, broadest interpretation necessary © Tefko Saracevic 30
Bibliometrics “… the quantitative treatment of the properties of recorded discourse and behavior pertaining to it. ” Fairthorne, 1969 Ø Many quantitative studies & some laws ¤ Bradford’s law, Lotka’s law – regularities ¥ quantity/yield distributions of journals, authors Ø also related areas: ¤Scientometrics ¥covering science in general, not just publications ¤Infometrics ¥all information objects ¤Webmetrics or cybermetrics ¥using bibliometric techniques to study the web © Tefko Saracevic Table of content 31
Part 5. People Ø Professional services ¤ in organization – moving toward knowledge management, competitive intelligence ¤ in industry – vendors, aggregators, Internet, Ø Research ¤ user & use studies ¤ interaction studies ¤ broadening to information seeking studies, social context, collaboration ¤ relevance studies ¤ social informatics © Tefko Saracevic 32
User & use studies Ø Oldest area ¤covers many topics, methods, orientations ¤many studies related to IR ¥e. g. searching, multitasking, browsing, navigation ¤ theoretical & experimental studies on relevance Ø Branching into Web use studies ¤quantitative & qualitative studies ¤emergence of webmetrics © Tefko Saracevic 33
Interaction Ø Traditional IR model concentrates on matching but not on user side & interaction Ø Several interaction models suggested ¥Ingwersen’s cognitive, Belkin’s episode, Saracevic’s stratified model ¤hard to get experiments & confirmation Ø Considered key to providing ¥basis for better design ¥understanding of use of systems Ø Web interactions: a major new area © Tefko Saracevic 34
Information seeking Ø Concentrates on broader context not only IR or interaction, people as they move in life & work Ø Number of models provided ¤ e. g. Kuhlthau’s information search process, Järvelin’s information seeking Ø Includes studies of ‘life in the round, ’ making sense, information encountering, work life, information discovery Ø Based on concept of social construction of information © Tefko Saracevic Table of content 35
Paradigm split in technology - people Part 6. Ø Split from early 80’s to date into: System-centered ¤algorithms, TREC, search engines ¤continue traditional IR model Human-(user)-centered ¤cognitive, situational, user studies ¤interaction models, some started in TREC ¤relevance studies © Tefko Saracevic 36
Human vs. system Ø Human (user) side: ¤ often highly critical, even one-sided ¤ mantra of implications for design ¤ but does not deliver concretely Ø System side: ¤ mostly ignores user side & studies ¤ ‘tell us what to do & we will’ Ø Issue NOT H or S approach ¤ even less H vs. S ¤ but how can H AND S work together ¤ major challenge for the future © Tefko Saracevic 37
Great separation Ø IR in computer science ¤ completely technology oriented ¤ VERY international ¤ not aware at all of the other side Ø SIGIR growing a lot: ¤ 2010 subm. 520 accept. 87, 16. 5% ¤ 2007 subm. 490, accept. 85, 17% ¤ 2006 subm. 399, accept. 74, 19% ¤ 1999 subm. 135, accept. 33, 24% © Tefko Saracevic Ø IR, user studies, services in information science ¤ mostly people oriented ¤ aware, but participating less with other side ¤ only a few LIS people come to SIGIR, even fewer SIGIR to ASIST, none to ALA 38
Calls vs support Ø Many calls for user-centered or human-centered design, approaches & evaluation Ø Number of works discussing it, but few proposing concrete solutions Ø But: most support for system work ¤ in the digital age support is for digital Ø Recent attempt at combining two views: Book: Ingerwersen, P. and Järvelin, K. (2005). The Turn: Integration of information seeking and retrieval in context. Springer. Table of content © Tefko Saracevic 39
Relations, alliances, competition Part 7. Ø With a number of fields. . . Ø Strongest: 1. Librarianship 2. Computer science © Tefko Saracevic 40
Common grounds IS & librarianship share: Ø Social role in information society Ø Concern with effective utilization of graphic & other types of records Ø Research problems related to a number of topics Ø Transfer to & from information retrieval © Tefko Saracevic 41
Differences IS & librarianship differ in: Ø Selection & definition of many problems addressed Ø Theoretical questions & framework Ø Nature & degree of experimentation Ø Tools and approaches used Ø Nature & strength of interdisciplinary relations © Tefko Saracevic 42
One field or two? Ø Point of many debates Ø Suggest: TWO fields in strong interdisciplinary relations Ø Not a matter of “better” or “worse” - matters little ¤ common arguments between many fields Ø Differences matter in: ¤ problem selection & definition ¤ agenda, paradigms ¤ theory, methodology ¤ practical solutions, systems Ø Best example: IR & library automation © Tefko Saracevic 43
Which? Ø Librarianship. Information science Ø Library and information science Ø Libraryandinformationscience ¤ Michael Buckland’s suggestion Ø Information sciences Ø Information ¤like in the “Information School” © Tefko Saracevic 44
IS & computer science Ø Ø CS primarily about algorithms IS primarily about information and its users and use Not in competition, but complementary Growing number of computer scientists active in IS – particularly in IR and digital libraries Ø Concentrating on ¤ advanced IR algorithms & techniques ¤ digital library infrastructure & various domains ¤ human computer interaction © Tefko Saracevic 45
Interaction and IS Ø Two streams: ¤computer-human interaction ¤human-computer interaction Ø Many studies on: ¤machine aspects of interaction ¤human variables in interaction ¥Problems: little feedback between ¥very hard to evaluate Ø Web interactions: a major area Ø Another interdisciplinary area ¤computers sc. , cognitive sc. , ergonomics, Table of content © Tefko Saracevic 46
Part 8. Digital libraries Ø LARGE & growing area Ø “Hot” area in R&D ¤a number of large grants & projects in the US, European Union, & other countries ¤but “DIGITAL” big & “libraries“ small Ø “Hot” area in practice ¤building digital collections, hybrid libraries, ¤many projects throughout the world ¥but in the US funding has dryed out © Tefko Saracevic 47
Technical problems Ø Substantial - larger & more complex than anticipated: ¤ representing, storing & retrieving of library objects ¥ particularly if originally designed to be printed & then digitized ¤ operationally managing large collections - issues of scale ¤ dealing with diverse & distributed collections ¥ interoperability; federated searching ¤ assuring preservation & persistence ¤ incorporating rights management © Tefko Saracevic 48
Research issues Ø understanding objects in DL ¤representing in many formats Ø metadata, cataloging, indexing Ø conversion, digitization Ø organizing large collections Ø managing collections, scaling Ø preservation, archiving Ø interoperability, standardization Ø accessing, using, searching ¤ federated searching of distributed collections Ø evaluation of digital libraries © Tefko Saracevic 49
DL projects in practice Ø Heavily oriented toward institutions & their missions ¤in libraries, but also others ¥museums, societies, government, commercial ¥come in many varieties Ø Spread globally ¤including digitization Ø U California, Berkeley’s Libweb “lists over 8000 pages from libraries in over 146 countries” Ø Spending increasing significantly ¤often a trade-off for other resources © Tefko Saracevic 50
Connection? Ø DL research & DL practice presently are conducted ¤ mostly independently of each other ¤ minimally informing each other ¤ and having slight, or no connection Ø Parallel universes with little connections & interaction, at present ¤ not good for either research or practice © Tefko Saracevic Table of content 51
Part 9. Conclusions IS contributions Ø IS effected handling of information in society Ø Developed an organized body of knowledge & professional competencies Ø Applied interdisciplinarity Ø IR reached a mature stage ¤ penetrated many fields & human activities Ø Stressed HUMAN in human-computer interaction © Tefko Saracevic 52
Challenges Ø Adjust to the growing & changing social & organizational role of inf. & related infrastructure Ø Play a positive role in globalization of information Ø Respond to technological imperative in human terms Ø Respond to changes from inf. to communication explosion bringing own experiences to resolutions, particularly to the web Ø Join competition with quality Ø Join DIGITAL with LIBRARIES © Tefko Saracevic 53
Juncture Ø IS is at a critical juncture in its evolution Ø Many fields, groups. . . moving into information ¤ big competition ¤ entrance of powerful players ¤ fight for stakes Ø To be a major player IS needs to progress in its: ¤ research & development ¤ professional competencies ¤ educational efforts ¤ interdisciplinary relations Ø Reexamination necessary © Tefko Saracevic 54
Thank you Miró! Thank you Picasso! © Tefko Saracevic 55
Hvala Tatjana & na pozivu! © Tefko Saracevic 56
Bibliography Bates, M. J. (1999). Invisible Substrate of Information Science. Journal of the American Society for Information Science, 50, 1043 -1050. Bush, V. (1945). As We May Think. Atlantic Monthly, 176, (11), 101 -108. Available: http: //www. theatlantic. com/unbound/flashbks/computer/bushf. htm Hjørland, B. (2000). Library and Information Science: Practice, Theory, and Philosophical Basis. Information Processing & Management, 36 (3), 501 -531. Pettigrew, K. E. & Mc. Kechnie, L. E. F. (2000). The use of theory in information science research. Journal of the American Society for Information Science and Technology, 52 (1), 62 - 73. Saracevic, T. (1999). Information Science. Journal of the American Society for Information Science, 50 (9) 1051 -1063. Available: http: //www. scils. rutgers. edu/~tefko/JASIS 1999. pdf Saracevic, T. (2005). How were digital libraries evaluated? Presentation at the course and conference Libraries in the Digital Age (LIDA)30 May-3 June 2005, Dubrovnik, Croatia. Available: http: //www. scils. rutgers. edu/~tefko/DL_evaluation_LIDA. pdf Webber, S. (2003) Information Science in 2003: A Critique. Journal of Information Science, 29, (4), 311 -330. White, H. and Mc Cain, K. (1998). Visualizing a Discipline: An Author Co-citation Analysis of Information Science 1972 -1995. Journal of the American Society for Information Science, 49 (4), 327 -355. © Tefko Saracevic 57