Searching Scholarly Literature A Google Scholar Perspective Anurag

  • Slides: 19
Download presentation
Searching Scholarly Literature: A Google Scholar Perspective Anurag Acharya

Searching Scholarly Literature: A Google Scholar Perspective Anurag Acharya

Overview § Goals & key ideas § Support for libraries § Coverage & usage

Overview § Goals & key ideas § Support for libraries § Coverage & usage § Reflections

Goal: Best possible scholarly search § Single place to find scholarly material – All

Goal: Best possible scholarly search § Single place to find scholarly material – All areas, all sources, all languages, all time – Relevance-based ordering (“Google-like”) § Easy to use – Common queries should just work – Researchers, like everyone else, just want answers

Idea: Index all forms of articles § Preferred form: fulltext – Go beyond author

Idea: Index all forms of articles § Preferred form: fulltext – Go beyond author identified features – Facilitate serendipity § Fulltext online for only small fraction – Influential/seminal papers still offline § Index whatever form is available – Abstract or even just the citation

Idea: Be inclusive § Provide worldwide visibility to all research – Should be able

Idea: Be inclusive § Provide worldwide visibility to all research – Should be able to find research done anywhere – Who knows what triggers discovery § Our goal is to find all scholarly work – Journals, conferences, preprints, reports – All countries, all languages, all sources § Make decisions on a per-article basis – Good work can come from anywhere!

Idea: Universal discovery § Free to all users everywhere – Should be able to

Idea: Universal discovery § Free to all users everywhere – Should be able to find relevant research no matter where you live – Don’t know where the next magic will come from § Access will depend on variety of factors – Impact of discovery is larger than people think

Idea: Rank as researchers do § Ideal: The Stuff I Need To Know §

Idea: Rank as researchers do § Ideal: The Stuff I Need To Know § Approximation: Relevant stuff that is likely to be good § How to estimate “likely to be good”? – who wrote it, where it was published, how many people cite it, where citations are from § Plus usual information retrieval techniques

Idea: Automate citation extraction § Necessary to be able to scale § Much variance

Idea: Automate citation extraction § Necessary to be able to scale § Much variance in citation styles – Widely different conventions § Citations error-prone – Desire to compress (unusual abbreviations) – Author sloppiness + error propagation § Need to normalize citations

Idea: Rank works, not instances § Single work may have many forms/versions – Preprint,

Idea: Rank works, not instances § Single work may have many forms/versions – Preprint, report, conference paper, journal article § Each may be cited independently – Need to collect citations for true import of work § Grouping versions facilitates ranking/presentation – Collect citations for all versions – improve ranking – Present a single work as a unit – easier to scan

Idea: Links to offline content § Only a small fraction of articles online §

Idea: Links to offline content § Only a small fraction of articles online § Libraries hold huge repositories – Books, journals, articles, and much more § Link to library resources – Help users find the wealth in their libraries

Support for libraries § Library Links – Links to resources in a given library

Support for libraries § Library Links – Links to resources in a given library – For libraries that use link resolvers/Open. URLs – About 325 participating libraries, growing rapidly § Library Search – For libraries participating in OCLC’s Open World. Cat – Find nearby libraries that have the book – Looking to work with other union catalogs!

Library links - example

Library links - example

Library search - example

Library search - example

Library search – example

Library search – example

Google Scholar Coverage § Commercial publishers & scholarly societies – Fulltext from all major

Google Scholar Coverage § Commercial publishers & scholarly societies – Fulltext from all major except Elsevier and ACS – Includes popular papers from all publishers as citations/abstracts § Hosting services – many publishers, societies – Highwire, Allen. Press, Meta. Press, Atypon, Ingenta, MUSE, others § Public A&Is – Pub. Med, ADS – Fairly complete, no matter what you read in some reviews…. § Open web and institutional repositories – Arxiv. org, Repec, pubmedcentral, others § Open access journals – all we can find (including Scielo)

Coverage by category

Coverage by category

Worldwide usage § Countries with the most queries: – US, UK, Australia, Germany, Mexico,

Worldwide usage § Countries with the most queries: – US, UK, Australia, Germany, Mexico, Brazil – Canada, China, Netherlands, India, France – Japan, Israel, Italy, Taiwan, Spain – Switzerland, Colombia, Nigeria, Philippines – S. Africa, S. Korea, Malaysia, Egypt, Turkey

Reflections § Audience will expand beyond scholars – Esp for health/medical research, maybe others

Reflections § Audience will expand beyond scholars – Esp for health/medical research, maybe others – Educated laypeople, patients, care-givers § The service is useful today for many users – US as well as internationally – Much more still to do to reach goals

Finally… Mendel's concept of the laws of genetics was lost to the world for

Finally… Mendel's concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential. – As We May Think (Vannevar Bush), July 1945 § Hope: loss of Mendel’s laws never repeated