BIBLIOMETRICS Tefko Saracevic Rutgers University http www scils
BIBLIOMETRICS Tefko Saracevic Rutgers University http: //www. scils. rutgers. edu/~tefko © Tefko Saracevic 11
What is? Ü“… all studies which seek to quantify processes of written communication. ” Pritchard Ü“… the quantitative treatment of the propertiesd of recorded discourse and behavior pertaining to it. ” Fairthorne ÜRecorded communication - ‘literature’-> quantitative methods © Tefko Saracevic 2
Alan Pritchard 1969 Ü Coined the term "bibliometrics" "the application of mathematics and statistical methods to books and other media of communication“ Journal of Documentation (1969) 25(4): 348 -349 © Tefko Saracevic 3
and other related metrics … Ü Also used to study broader than books, articles … ÄScientometrics w covering science in general, not just publications ÄInfometrics w all information objects ÄWebmetrics or cybermetrics w web connections, manifestations w using bibliometric techniques to study the relationship or properties of different sites on the web © Tefko Saracevic 4
Concepts Basic (primitive) concepts: 1. Subject 2. Recorded communication -> document, information object 3. Subject literature ÜBibliometrics related to: Ä science of science Äsociology of science - numerical methods © Tefko Saracevic 5
Literature studies ÜQualitative Äoften in humanities, librarianship ÜQuantitative Äbibliometrics ÜMixed © Tefko Saracevic 6
Reasons for quantitative studies of literature ÜAnalysis of structure and dynamics Äsearch for regularities - predictions possible ÜUnderstanding of patterns Ä“order out of documentary chaos” Äverification of models, assumptions ÜRationale for policies & design © Tefko Saracevic 7
Why quantitative studies? ÜQualitative methods often depend on assertions. ‘authoritative’ statements, anecdotal evidence ÜScience searches for regularities ÜSuccess of statistical methods in social sciences ÜNeed for justification & basis for decisions ÜSomething can be counted - irresistible © Tefko Saracevic 8
Application in. . . ÜHistory of science ÜSociology of science ÜScience policy; resource allocation ÜLibrary selection, weeding, policies ÜInformation organization ÜInformation management Äutilization © Tefko Saracevic 9
Historical note Ü Bibliometrics long precedes information science Ü But found intellectual home in information science Ästudy of a basic phenomenon - literature Ü It is not ‘hot’ lately, but still produces very interesting results Ü Branched out into web studies (web is a “literature” as well) © Tefko Saracevic 10
What studied? ÜGoverned by data available in documents or information resources in general - that what can be counted Äauthor(s) Äorigin w organization, country, language Äsource w journal, publisher, patent … © Tefko Saracevic 11
what … more Äcontents w text, parts of text, subject, classes Ärepresentation Äcitations w to a document, in a document, co-citation Äutilization w circulation, various uses Älinks Äany other quantifiable attribute © Tefko Saracevic 12
Tools ÜScience Citation Index ÜCompilation of variables from journals in a subject ÜUse data ÜPublication counts from indexes, or other data bases ÜWeb structures, links © Tefko Saracevic 13
Variable: authors Änumber in a subject, field, institution, country Ägrowth Äcorrelation with indicators like GNP, energy etc. Äproductivity e. g. Lotka’s law Äcollaboration - co-authorship, associated networks Ädynamics - productive life, transcience, epidemics Äpapers/author in a subject Ämapping © Tefko Saracevic 14
Variable: origin ÜRates of production, size, growth by Äcountry, institution, language, subject ÜComparison between these ÜCorrelation with economic & other indicators © Tefko Saracevic 15
Variable: sources ÜConcentration most often on journals ÜGrowth, dynamics, numbers Äinformation explosion - exponential laws Ätime movements, life cycles ÜScatter - quantity/yield distribution ÄBradford’s law Ü Various distributions Ä by subject, language, country © Tefko Saracevic 16
Variable: contents ÜAnalysis of texts Ädistribution of words – Zipf’s law Äwords, phrases in various parts Äsubject analysis, classification Äco-word analysis © Tefko Saracevic 17
Variable: representation Äfrequency of use of index terms, classes Ädistribution laws - key terms where? Äthesaurus structure © Tefko Saracevic 18
Variable: citations ÜStudied a lot; many pragmatic results Äbase for citation indexes, web of science, impact factors, co-citation studies etc ÜDerived: Änumber of references in articles Änumber of citations to articles w research front; citation classics Äbibliographic coup[ling © Tefko Saracevic 19
citations … more Äco-citations w author connections, subject structure, networks, maps Äcentrality w of authors, papers Ävalidation with qualitative methods Äimpact © Tefko Saracevic 20
Variable: utilization Äfrequency Ädistribution of requests for sources, titles w e. g. 20/80 law Ärelevance judgement distributions Äcirculation patterns Äuse patterns © Tefko Saracevic 21
Variable: links ÜDevelopment of link-based metrics Äin-links, out-links ÜWeb structure ÜWeb page depth; update ÜPage. Rank vs quality © Tefko Saracevic 22
Examples from classic studies ÄComparative publications over centuries ÄNumber of journals founded over time ÄNumber of abstracts published over time ÄNational share of abstracts in chemistry ÄNational scientific size vs. economy size ÄBibliographic coupling and co-citation ÄWeb structures, links © Tefko Saracevic 23
Examples of laws & methods ÜLotka’s law ÜBradford’s law ÜZipf’s law ÜImpact factor ÜCitation structures ÜCo-citation structures © Tefko Saracevic 24
Alfred J. Lotka 1926 Ü Statistics—the frequency distribution of scientific productivity Purpose: to "determine, if possible, the part which men of different calibre contribute to the progress of science“ ÄLooked at Chemical Abstracts Index, then Geschichtstafeln der Physik w J. Washington Acad. Sci. 16: 317 -325 © Tefko Saracevic 25
Lotka’s law: xn • y = C The total number of authors y in a given subject, each producing x publications, is inversely proportional to some exponential function n of x. Ü Where: Ä x Ä y Ä n Ä C = number of publications = no. of authors credited with x publications = constant (equals 2 for scientific subjects) = constant Üinverse square law of scientific productivity © Tefko Saracevic 26
No. of authors Lotka's Law - scientific publications © Tefko Saracevic xn • y = C 27
Samuel Clement Bradford 1934, 1948 Ü Distribution of quantity vs yield of sources of information on specific subjects Ä he studied journals as sources, but applicable to other Ä what journals produce how many articles in a subject and how are they distributed? or Ä How are articles in a subject scattered across journals? Ü Purpose: to develop a method for identification of the most productive journals in a subject & deal with what he called “documentary chaos” First published in: Engineering (1934) 137: 85 -86, then in his book Documentation, (1948) © Tefko Saracevic 28
Bradford’s law "If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n 2 : n 3 …" © Tefko Saracevic 29
Bradford's Law of Scattering – an idealized example No. of articles per source journals 60 1 3 2 35 30 1 25 2 9 8 4 6 10 5 7 27 5 4 3 5 © Tefko Saracevic Total no. of articles 60 130 70 30 50 18 130 32 60 35 130 20 15 30
Bradford's Law of Scattering – zones nucleus 3 sources 130 articles 9 sources 130 articles 27 sources 130 articles © Tefko Saracevic Garfield hypothesis 31
George Kingsley Zipf 1935, 1949 Ü The psycho-biology of language: an introduction to dynamic philology (1935) Ü Human behavior and the principle of least effort: An introduction to human ecology (1949) Ü Looked, among others, at frequency distributions of words in given texts Äcounted distribution in James Joyces’ Ulysses Ü Provided an explanation as to why the found distributions happen: Principle of least effort © Tefko Saracevic 32
Zipf’s law: r • f = c ÜWhere: r = f = rank (in terms of frequency) frequency (no. of times the given word is used in the text) c = constant for the given text Ü For a given text the rank of a word multiplied by the frequency is a constant Ü Works well for high frequency words, not so well for low – thus a number of modifications © Tefko Saracevic 33
Charles F. Gosnell 1944 Obsolescence Ü He studied obsolescence of books in academic libraries via their use • College Res. Libr. (1994) 5: 115 -125 ÜBut this was extended to study of articles via citations, and other sources Ü Age of citations in articles in a subject: Ähalf life – half of the citations are x year old etc w different subjects have very different half-lives © Tefko Saracevic 34
Number of users Curve of obsolescence Age at time of use © Tefko Saracevic 35
Eugene Garfield 1955 Ü Focused on scientific & scholarly communication based on citations • Science (1995) 122: 108 -111 Ü Founded Institute for Scientific Information (ISI) Ämajor proeduct now ISI Web of Knowledge Ü Impact factor for journals, based on how much is a journal cited Ü Mapping of a literature in a subject Ü Citation indexes/web of knowledge Ä MAJOR resources in bibliometric studies © Tefko Saracevic 36
Citation matrix citing article cited article © Tefko Saracevic article citing article citing article 37
Science Citation Index Association-of-ideas index citing article cited article © Tefko Saracevic article citing article citing article 38
Co-citation analysis Articles that cite the same article are likely to both be of interest to the reader of the cited article citing article © Tefko Saracevic These two articles are likely to be related 39
Impact factor (IF) number of citations received in current year by papers published in the journal in the previous two years divided by number of papers published in the journal in the previous two years Ü IF has become over time a crucial indicator of journal quality and Ägiven ISI a monopoly position in the evaluation of journal quality Ü Reported in Journal Citation Reports (1976 -) © Tefko Saracevic 40
Garfield’s Hist. Cite Ü “Bibiliographic Analysis and Visualization Software” Ü Provides citation statistics & graphs for people, journals, institutions … Ävarious citations scores, no. of cited references in articles … various graphs with connections Ü Example: articles and authors for JASIST (and predecessor names) for 1956 -2004 Äincludes citations to authors © Tefko Saracevic 41
Conclusion ÜBibliometrics, & related scientometrics, infometrics, webmetrics provide insight into a number of properties of information objects Äsome general, predictive “laws” formulated Ästructures have been exposed, graphed Ämyriad data collected & analyzed ÜA good area for research! © Tefko Saracevic 42
Sources used in making this presentation– among others Ü Ruth Palmquist Bibliometrics Ü Donna Bair-Mundy Boolean, bibliometrics, and beyond Ü Short set of bibliometric exercises by J. Downie http: //people. lis. uiuc. edu/~jdownie/biblio/ © Tefko Saracevic 43
- Slides: 43