Creating a Library Learning Analytics Database Michael Doran
Creating a Library Learning Analytics Database Michael Doran, Systems Librarian University of Texas at Arlington doran@uta. edu
To be covered… • What is a library learning analytics database? • Why is it needed? • A look under the hood • Security & privacy issues • Library vs. campus systems LITA Forum - Michael Doran - Nov 19, 2016
Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. [1] From “ 1 st International Conference on Learning Analytics and Knowledge 2011” via Wikipedia article on “Learning analytics” LITA Forum - Michael Doran - Nov 19, 2016 1
Creating a library learning analytics database… …what problem(s) does that solve? LITA Forum - Michael Doran - Nov 19, 2016
Obligatory graphic of silos “silos” photo by Doc Searls CC BY 2. 0 LITA Forum - Michael Doran - Nov 19, 2016
Problems • Library use data resides in separate systems • Library systems typically don’t contain the student demographic information (e. g. major, academic program, GPA, student classification, etc. ) needed to do learning analytics [a GOOD thing] • Library data sets may use different unique identifiers (e. g. Institutional ID number vs. Net. ID) preventing linking them together LITA Forum - Michael Doran - Nov 19, 2016
Use data from various library systems Data Data Demographic data from Data Library Learning Analytics Database campus system LITA Forum - Michael Doran - Nov 19, 2016 Centralized database
LIBrary Learning ANalytics Database “LIBLAND” LITA Forum - Michael Doran - Nov 19, 2016
Use data from various library systems Data Data Demographic data from Data LIBLAND Centralized database campus system LITA Forum - Michael Doran - Nov 19, 2016 1
Examples of systems with library use data LITA Forum - Michael Doran - Nov 19, 2016
Entrance/exit gate turnstiles Users swipe their “Mav Express” ID card to both enter and exit. Exit Entry LITA Forum - Michael Doran - Nov 19, 2016
Interlibrary Loan Requests (e. g. ILLiad) LITA Forum - Michael Doran - Nov 19, 2016
Group Study Room Reservations (e. g. Open. Room) LITA Forum - Michael Doran - Nov 19, 2016
Off-campus Access to E-resources (e. g. EZproxy) LITA Forum - Michael Doran - Nov 19, 2016
ILS Catalog (e. g. Voyager) LITA Forum - Michael Doran - Nov 19, 2016
Use data from various library systems Data Data Demographic data from Data LIBLAND Centralized database campus system LITA Forum - Michael Doran - Nov 19, 2016 1
Examples of systems with demographic data LITA Forum - Michael Doran - Nov 19, 2016
Campus LDAP Directory (“CEDAR” at UT Arlington) “provide a consolidated standards-based directory which can provide consistent and complete information on students, faculty, staff, courses, organizations, and other electronically-describable entities and relationships” http: //www. uta. edu/oit/eos/cedar/statement-of-direction. php LITA Forum - Michael Doran - Nov 19, 2016
Blackboard Analytics The center (for Distance Education] aspires to [. . . ] be a resource to faculty and program administrators driving the use of learning analytics, including student learning outcomes; [. . . ] From UT Arlington’s Center for Distance Education http: //www. uta. edu/distance/annual-report/index. html LITA Forum - Michael Doran - Nov 19, 2016
Your Campus? Access to, or data dumps from: • LDAP directory • Peoplesoft • Banner • LMS (Learning Management System) • … LITA Forum - Michael Doran - Nov 19, 2016
Demographic data sources Past, present, and future students CEDAR/LDAP 137, 000 records; 16 attributes Bb A 43, 000 recs; 16 attrib’s Current semester’s students LITA Forum - Michael Doran - Nov 19, 2016
Attributes (from demographic data sources) LDAP (CEDAR) Blackboard Analytics • UTA ID • Student classification • Enrollment term • Academic program • Major • Gender • Grade points • Ethnicity • Hours complete • Age • GPA (calculated) • Tuition residency • Student status (i. e. enrolled now? ) • Student type • Ethnicity • College/school • Gender • Department • Permanent address zip code • Academic plan • Student address zip code • Is academic partner? • Enrollment code • Is online student? • Enrollment session • Instruction mode • Expected graduation date • Academic load • Library student employee? • Academic standing LITA Forum - Michael Doran - Nov 19, 2016
Attributes (from demographic data sources) LDAP (CEDAR) Blackboard Analytics • UTA ID • Student classification • Enrollment term • Academic program • Gender Note some attributes we are NOT retrieving • Ethnicity Grade points • Age Hours complete • ID number… but not name • Tuition residency GPA (calculated) • Student type Student status (i. e. enrolled now? ) • Age… but not date of birth • College/school Ethnicity • Department Gender • Zip code… but not street address • Major • • • privacy • Permanent address zip code • Academic plan • Student address zip code • Is academic partner? • Enrollment code • Is online student? • Enrollment session • Instruction mode • Expected graduation date • Academic load • Is library student employee? • Academic standing LITA Forum - Michael Doran - Nov 19, 2016
3 other important data tables LITA Forum - Michael Doran - Nov 19, 2016
Other Data Table #1 Knowing the affiliation of users who are not students helps fill in the gaps when linking library use data. LITA Forum - Michael Doran - Nov 19, 2016
CEDAR/LDAP 698, 000 records; 1 attribute CEDAR/LDAP 137, 000 records; 16 attributes Bb A 43, 000 recs; 16 attrib’s LITA Forum - Michael Doran - Nov 19, 2016 All records that have a UTA ID Only attribute is Primary Affiliation • student • faculty • employee • staff • affiliate
Other Data Table #2 Cross-reference table for different unique identifiers LITA Forum - Michael Doran - Nov 19, 2016
All the UTA IDs (and associated Net. IDs) Problem: Library data sets may use different unique identifiers (e. g. Institutional ID number vs. Net. ID) preventing linking them together. • Demographic data only has UTA ID as an identifier • Much of the use data (e. g. ILLiad, Ezproxy, Open. Room) only has the users’ Net. ID as an identifier LITA Forum - Michael Doran - Nov 19, 2016
Other Data Table #3 Cross-reference table for cryptographic hash values LITA Forum - Michael Doran - Nov 19, 2016
Making a (cryptographic) hash of it A one-way hash function is an algorithm that takes a string (in this case, a UTA ID number) and returns a fixed- length alphanumeric string (the “hash value”). foo. pl LITA Forum - Michael Doran - Nov 19, 2016
Making a (cryptographic) hash of it Slightly different strings get vastly different hash values The same string always gets the same hash value* LITA Forum - Michael Doran - Nov 19, 2016
Making a (cryptographic) hash of it *The same string always gets the same hash value Which is problematic, since UTA IDs are known to be 10 digit numbers. It wouldn’t be difficult to generate hash values for all the 10 digit numbers in the ranges used for UTA IDs and have a 10 -digit number/hash value table, essentially reversing the process. LITA Forum - Michael Doran - Nov 19, 2016 1
Cryptographic salt A cryptographic salt is random data that is used as an additional input to a one-way hash. bar. pl LITA Forum - Michael Doran - Nov 19, 2016
Cryptographic salt The same input string (UTA ID) gets a different hash value each time… …because it’s being combined with a different random salt each time the SHA 256 algorithm is applied. LITA Forum - Michael Doran - Nov 19, 2016
This will allow us to do data anonymization [To be continued…] LITA Forum - Michael Doran - Nov 19, 2016
Quick Review of What’s in LIBLAND Other Data Use Data Demographic Data LITA Forum - Michael Doran - Nov 19, 2016
Use data from various library systems Data Data Demographic data from Data LIBLAND campus system LITA Forum - Michael Doran - Nov 19, 2016 Centralized database
LITA Forum - Michael Doran - Nov 19, 2016
LITA Forum - Michael Doran - Nov 19, 2016
Yikes! LITA Forum - Michael Doran - Nov 19, 2016
Start small(er) To get started on a “LIBLAND” project all you need are: • One library use data source • One demographic data source Library use data Demographic data System A LDAP Database Directory script SQL load files Database LIBLAND LITA Forum - Michael Doran - Nov 19, 2016
Requirements Expertise in: • Database design • SQL • Programming (a scripting language such as Perl, Python, or PHP) Access to: • (A separate, secure) database server • Library systems containing use data • Campus systems with demographic data LITA Forum - Michael Doran - Nov 19, 2016
scripts Library use data • Recommend a scripting language like Perl, PHP, or Python • Script needs to: • Connect to system • Execute a query • Parse data • Output data (as SQL load file) You will need “connector” library/module for the system you are connecting to: e. g. for Perl, the DBI/DBD: : Oracle modules for connecting to an Oracle database, or Net: : LDAP for connecting to an LDAP directory. LITA Forum - Michael Doran - Nov 19, 2016 Demographic data System A LDAP Database Directory script SQL load files Database LIBLAND
SQL load files Library use data Scripts can/should output data as SQL “INSERT” statements For output consisting of many rows of data… • Start SQL load file with SET autocommit=0; End with: COMMIT; • Start a new INSERT statement every 10, 000 rows Demographic data System A LDAP Database Directory script SQL load files Database LIBLAND LITA Forum - Michael Doran - Nov 19, 2016
SQL Load File Command to load file: mysql -u libland -p libland < illiad. sql LITA Forum - Michael Doran - Nov 19, 2016
Granularity of Use Data Retrieved privacy We’re not pulling citation data… LITA Forum - Michael Doran - Nov 19, 2016
Granularity of Use Data Retrieved privacy We extract the “destination host” but not the full URL (w/ query string) that identifies the exact resource. Note: By default, EZProxy logs do not retain a username (“session ID” is default); capturing that data requires a configuration change. LITA Forum - Michael Doran - Nov 19, 2016
Distributing LIBLAND Data LITA Forum - Michael Doran - Nov 19, 2016
LIBLAND Tables “Other” Data Use Data Demographic Data LITA Forum - Michael Doran - Nov 19, 2016
Tables. . . and Views are virtual tables that get created on-the-fly via an SQL select statement. Each view in LIBLAND contains the same data as in the table EXCEPT the UTA ID (or Net. ID) is replaced with the one-way hash value. The views are what get exported from the LIBLAND server and imported into an MS Access database for distribution to staff.
On secure server Distributed to staff
No Identifier (only a SHA-256 cryptographic hash) privacy LITA Forum - Michael Doran - Nov 19, 2016
Why go to that trouble? Data privacy is reason #1, #2, & #3 Bonus reason: If there is intent to publish or present the results of the analysis, you typically have to get institutional review board (IRB) approval. In advance. However… LITA Forum - Michael Doran - Nov 19, 2016
IRB Review Exemption (YMMV, always discuss with your IRB) LITA Forum - Michael Doran - Nov 19, 2016
University Analytics: "We are the Borg. Your data will be added to our own. Resistance is futile. " ANALYTICS LITA Forum - Michael Doran - Nov 19, 2016
Questions? Please feel free to contact: Michael Doran Systems Librarian University of Texas at Arlington doran@uta. edu LITA Forum - Michael Doran - Nov 19, 2016
- Slides: 56