INSPIRE Log File Analysis Cole Adams Background Publication
INSPIRE Log File Analysis Cole Adams
Background Publication database Over 50, 000 users 25, 000 documents a year Improved research collaboration
Background Aging technology – 1974 Replacement for SPIRES
In Transition The division of usage thought to be 66% SPIRES, 33% INSPIRE based on search counts Questions about this division: Is search counts the right thing to measure? What is the distinction between the division?
Analysis Methods: Log Files User activity recorded in log files – – Data from May 1 to July 31 Over 5 GB of log file data Analysis based on search queries Process Raw Log Files All HTTP requests to the server Cache Parser Processor Lists of search queries Statistical data (Sessions, search terms, etc) Analysis Graphs and Frustration detection
Session Data Definition of a session: A series of searches performed by a unique IP address with less than a 5 minute interval between searches Single Search Sessions are discounted
Sessions as Usage Multi-search session count as a superior usage metric
Breakdown of Usage by Country
Frustration Analysis Difference between INSPIRE/SPIRES and Google as search engines – E. g. “find author ellis and title higgs” Incorrect syntax usage can create problems find j phys. rev. d* find j phys. rev. , d* find j phys. rev. d 54* find j phys. rev. , d 54
Frustration Analysis: Results
Future Analysis and Extensions More advanced analysis of session data Classify sessions into types of usage Better filtering of scripts and bots Integration of the analysis scripts with INSPIRE's online statistics tools
Acknowledgements Department of Energy and the Science Undergraduate Laboratory Internship program SLAC National Accelerator Laboratory Travis Brooks, Joseph Blaylock, Valkyrie Savage, and Mike Sullivan
- Slides: 12