Data Analytics and Patron Privacy in Libraries A
Data Analytics and Patron Privacy in Libraries: A Balancing Act Becky Yoose (@yo_bj) Library Applications and Systems Manager The Seattle Public Library #c 4 l 18
Housekeeping ● ● Bibliography – https: //www. zotero. org/groups/c 4 l 18 yoose Slides - https: //osf. io/xb 4 mf/ Not comprehensive in all de-identification methods YMMV
Patron Data and Types of Personally Identifiable Information [PII] PII One - Data about a patron ● ● ● Name Physical/email address Phone number Date of birth Patron record number PII Two - Activity that can be tied back to a patron ● Search & circulation histories ● Computer/wifi sessions ● Reference questions ● Electronic resource access
We have a lot of PII data. ● Integrated Library Systems ● Database backups ● Print management systems ● Server logs ● Reference logs ● Public computer/wireless traffic logs ● Security camera footage ● Card reader logs ● Library programs ○ Attendance logs ○ Feedback responses ● PAPER FORMS ● STAFF EMAIL ● VENDOR DATABASES, BACKUPS, SERVER LOGS
DL Number? ? ?
How do you balance patron privacy with operational needs?
Case Study - The Seattle Public Library’s Data Warehouse Data Sources Staging Area Database [TPS] Reports
Case Study - Extract, Transform, Load Pull ALL the data sources, be it database or excel reports from external vendors SQL Server Integration Services (SSIS) jobs for automation Transformations done either in system memory or staging database tables as added privacy protections for PII data
Case Study - De-Identification Practices Hashed unique identifiers (SHA-2, salt) Truncating raw data (ex. NX 180. I 57 M 275 2015 to NX 180) Obfuscation (Birth date 2/15/78 vs Age 40) Aggregation of data throughout multiple tables Limit shared fields between tables
Re-identification through. . . Search patterns 4417749’s Search Queries: ● “numb fingers” ● “ 60 single men” ● “dog that urinates on everything” ● “Arnold” ● “landscapers in Lilburn, Ga” Fuzzy matching
Case Study - The Non-Technical Parts Wrangling Data Through Inventory Audits And Detective Work Privacy Policies Data Governance Contingency plans for vendors. . .
sdfads Library Marketing Vendor-Hosted Customer Relations Management System Library Databases and Warehouse
What have we learned today?
It takes a library to handle data responsibly.
Doing things in house does not necessarily mean protected data. Giving your data to vendors doesn’t necessarily protect data either.
Advocate.
Thank you Becky Yoose b. yoose@gmail. com @yo_bj
- Slides: 18