Data and text mining facilitating our researchers needs

  • Slides: 18
Download presentation
Data and text mining: facilitating our researchers’ needs in the 21 st century A

Data and text mining: facilitating our researchers’ needs in the 21 st century A case study Hui Hua Chua ALCTS CRS College & Research Libraries IG ALA Annual Conference · Chicago, IL · June 25, 2017

 • What is Text Assembler? • Project context • Implementation process • Lessons

• What is Text Assembler? • Project context • Implementation process • Lessons learned

WHAT IS TEXT ASSEMBLER? TA user interface Text Assemble r WSK API Lexis. Nexis

WHAT IS TEXT ASSEMBLER? TA user interface Text Assemble r WSK API Lexis. Nexis news corpus User interface: authenticated users query and view sample search results; queue query for processing and download results TA: passes query to API; retrieves and stores results for delivery; processes results to create plain text file; manages multiple queries and query processing based on WSK parameters

MICHIGAN STATE UNIVERSITY • Fall 2016: 50, 344 students enrolled • Undergraduate (39, 090

MICHIGAN STATE UNIVERSITY • Fall 2016: 50, 344 students enrolled • Undergraduate (39, 090 or 77. 6%) vs graduate (11, 254 or 22. 4%) • Top five colleges by enrollment: Business (7686), Social Science (6562), Natural Science (6192), Engineering (6075), Agriculture & Natural Resources (4588)

TDM LANDSCAPE IN 2014 • In flux and developing • Libraries and publishers receiving

TDM LANDSCAPE IN 2014 • In flux and developing • Libraries and publishers receiving researcher requests, but often no established business, technical or access models • Ad-hoc access to data requested on case-by-case basis from publisher • License or purchase corpus and host in-house

MSU LIBRARIES’ ROLE IN TDM • Facilitate data access (license negotiation, purchase, data hosting,

MSU LIBRARIES’ ROLE IN TDM • Facilitate data access (license negotiation, purchase, data hosting, work with APIs) • Consult or provide assistance with tools, methods or data Almost always driven by specific researcher request

LEXISNEXIS WEB SERVICES KIT (WSK) • Product: API plus access to content; enables larger

LEXISNEXIS WEB SERVICES KIT (WSK) • Product: API plus access to content; enables larger downloads of data than Lexis. Nexis Academic • Content: current news with updates • Business model: subscription • No specific user request or demand “Let a hundred flowers bloom”

IMPLEMENTATION TEAM • Digital Scholarship Librarian: Thomas Padilla • Digital Library Programmer: Devin Higgins

IMPLEMENTATION TEAM • Digital Scholarship Librarian: Thomas Padilla • Digital Library Programmer: Devin Higgins • Programmer: Megan Schanz • Liaison to School of Journalism: Hui Hua Chua

IMPLEMENTATION TIMELINE & PROCESS 7/2014 WSK acquired 8/201412/2014 Fact-finding & testing • • In-house

IMPLEMENTATION TIMELINE & PROCESS 7/2014 WSK acquired 8/201412/2014 Fact-finding & testing • • In-house LN sales and technical staff Other universities that had licensed WSK Potential MSU users and use cases Decision made to develop unmediated web-based user interface. Why? Technical considerations and large potential user base

IMPLEMENTATION TIMELINE & PROCESS 11/20141/2015 Direct API use to answer 2 specific research questions

IMPLEMENTATION TIMELINE & PROCESS 11/20141/2015 Direct API use to answer 2 specific research questions 12/2014 Project request submitted to MSUL Systems department 2/2015 Programmer assigned. System developed (1. 5 months) 8/2015 Usability testing of UI 9/2015 Public launch of Text Assembler 12/2016 Permission received from MSU Technologies to share code with acknowledgement. Source code: https: //gitlab. msu. edu/msu-libraries/text-assembler

LESSONS LEARNT: BEFORE IMPLEMENTATION • Better coordination of acquisition and implementation processes. Factfinding could

LESSONS LEARNT: BEFORE IMPLEMENTATION • Better coordination of acquisition and implementation processes. Factfinding could have been completed before purchase, leading to less time to public launch.

LESSONS LEARNT: POST IMPLEMENTATION • How to balance system constraints with user behavior or

LESSONS LEARNT: POST IMPLEMENTATION • How to balance system constraints with user behavior or preferences • • • System update (1 week development time) Work with users to formulate more targeted queries Hourly query rotation • More guidance needed with query formulation • • More explicit search instructions, specifically for selecting appropriate data sources and field searching Work with users individually • Takes a while for new system to be adopted and used

USAGE Queued searches since system went live 121 Number of different users in the

USAGE Queued searches since system went live 121 Number of different users in the last 3 months Average number of results per search 5 Largest search 299, 146 24, 855

USAGE CONTINUED Based on review of specific queries and user contacts • Primarily used

USAGE CONTINUED Based on review of specific queries and user contacts • Primarily used as research tool; not the UG teaching tool we had hoped for • Heaviest use by/in the Social Sciences Library relationship with TA users is arms-length

FURTHER QUESTIONS How is success defined? How do libraries define, assess and demonstrate value

FURTHER QUESTIONS How is success defined? How do libraries define, assess and demonstrate value in TDM services? Potential assessment of • contribution to research outputs • use in teaching

THANK YOU! Hui Hua Chua Collections & User Support Librarian Michigan State University Libraries

THANK YOU! Hui Hua Chua Collections & User Support Librarian Michigan State University Libraries chua@msu. edu TA source code: https: //gitlab. msu. edu/msu-libraries/text-assembler