Statistics in practice Measuring and managing Sampling inlibrary
Statistics in practice: Measuring and managing Sampling in-library use Sebastian Mundt sebastian. mundt@unibw-hamburg. de
¢ Framework ¢ Selection ¢ Examples ¢ Conclusions
Sampling in library statistics In libraries sampling has traditionally been used. . . - for catalogue evaluation - in user surveys - in performance measurement (e. g. correct shelving). “Official“ library statistics so far only allowed the full count: “Data referring to a period should cover the specified period in question, not the interval between two successive surveys. “ (ISO 2789: 1991)
Sampling in library statistics The full count of some measures would be. . . - too time consuming (costly), - practically impossible - too monotone. Consequence: Important activities of use have previously not been reported in most countries. Category # datasets Libraries 35 Collections 41 Library use (lending) 19 Library use (other) 4 Expenditure 7 Library staff 4 (ISO 2789: 1991) The revised International Standard ISO 2789: 2001 now recommends sampling methods for. . . - information requests - in-house - visits (gate count).
¢ Framework ¢ Selection ¢ Examples ¢ Conclusions
Sampling is selecting a subset of the population in question. A sample can be drawn randomly or not. The “accuracy“ of random samples can be measured in terms of error and confidence level. It depends on the sample size and the variance of the sample. Sampling can be “selective“ as regards. . . - time (reporting period) - location (branch, service point) - objects (media) - persons (satisfaction, user behaviour)
Selection procedure ISO/FDIS 2789: 2001 “The annual total is to be established from a sample count. The sample should be taken in one or more normal weeks and grossed up. ” NISO Z 39. 7 -2002 Draft Standard for Trial Use, Data Dictionary Version 2002 a “A “typical week“ is a time that is neither unusually busy nor unusually slow. Avoid holidays, vacation periods, days when unusual events are taking place in the community or in the library. Choose a week in which the library is open its regular hours. “ purposive (judgement) sampling requires mimimum statistical knowledge highly dependent on staff experience
“Typical week“ Visits per weekday (Münster UL) 100% 99, 5% 96, 8% 91, 8% 85, 1% 24, 8% mon tue wed thu fri sat Cluster: weeks comprise days of different activity level. Administration: weekwise count is easier to organize.
“Typical week” % deviation of visits from annual mean (Münster UL) Periods of average activity as estimated by reference staff “Typical“ weeks can hardly be anticipated even from data collected over several years.
“Typical week“ Minimum/maximum values (Münster UL) +22, 9% max (all) max (staff) +21, 7% +16, 9% +12. 4% +15, 3% +15, 8% 1998 1999 2000 -11, 6% min (staff) -20. 5% min (all) -15, 1% -17, 4% -17, 8% -23, 2% Data collected by purposive (judgement) sampling are a weak foundation for comparisons.
¢ Framework ¢ Selection ¢ Examples ¢ Conclusions
Selection method: case 1 Louisiana State University Libraries (reference statistics) Randomly and individually selected hours of the year (simple random sample) A sample size of 52 hours (of 4, 103 hours of service a year) was calculated given a confidence level of 90% and an error of +/- 11. 23% Total estimated by linear extrapolation Hourwise count is difficult to administer. Maxstadt, J. M. (1988): A new approach to reference statistics, C&RL (Feb. 1988), p. 85 -88 Similar (daywise): Bauer, K. (2000): Gathering ARL reference data, http: //info. med. yale. edu/assessment/methods. html
Selection method: case 2 New York University / Bobst Library (reference statistics) Based on reference data of previous year, weeks were “classified“ in high, medium and low usage (stratified random sample). Sample size of 15 weeks was calculated given a confidence level of 95% and error of +/- 400 [ 10%]. Linear extrapolation of weighted class means. Additional information (past data) is used to improve the sample. Separation of high and medium weeks difficult. Kesselman, M. ; Watstein, S. B. : The measurement of reference and information services, JAL (1987, 1), p. 24 -30
Selection method: case 3 University of South Carolina / Thomas Cooper Library (reference statistics) Found extremely high correlation (. 957) between reference activity and gate count. Extrapolation relative to boundary distribution (gate count) Deals with “missing“ days. Allows small random sample of a few weeks once high correlation is confirmed. Lochstet, G. ; Lehman, D. H. : A correlation method for collecting reference statistics, C&RL (Jan, 1999), p. 45 -53
Selection method: case 4 Münster University Library Which data from the library system can be used as boundary distribution? visits refe-rence reservations inside reservations remote account info rene-wals short loans 1. 000 reference . 876** 1. 000 reserv. inside . 802** . 751** reserv. remote . 437** . 347 . 269** account info . 800** . 765** . 796** . 220** renewals . 523** . 512* . 568** . 156** . 759** short loans . 473** . 383 . 558** . 117 . 312** . 140* normal loans . 506** . 057 . 656** -. 019 . 508** . 283** 1. 000. 483** In branch libraries the same datasets are collected. These can be used to extrapolate the sample count for visits and information requests.
Sampling locations Münster University Library Does reference activity in different branches correlate significantly? main reading Branch A 1. 000 reading . 437* Branch A . 593* 1. 000. 559* 1. 000 University of the FAF / University Library, Hamburg Does reference activity in different branches correlate significantly? - 4 branch libraries (3 interconnected) with separate service points and entrances in one building Over the first half of 2002 no relationship between branches was found: Branch 1 Branch 2 Branch 3 Branch 1 1. 000 Branch 2 -. 064 1. 000 Branch 3 . 022 -. 041 1. 000 Branch 4 . 122 . 058 . 031 Branch 4 1. 000
¢ Framework ¢ Selection ¢ Examples ¢ Conclusions
Conclusions From the point of data collection management it seems useful to choose a week as sampling unit. “Normal“ weeks can hardly be anticipated even from data collected over several years. It is, however, likely that certain usage data show significant correlation and provide useful information for estimating totals. If data from automated systems are used for correlation the workload of sampling can be reduced. In-library use activities correlate with in-library use of automated systems. Significant remote use should be correlated separately (e. g. frequent e-mail reference). Sampling locations might reduce the workload of data collection further. Results, however, are ambivalent.
- Slides: 18