An XML Log Standard and Tool for Digital

  • Slides: 27
Download presentation
An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves,

An XML Log Standard and Tool for Digital Library Logging Analysis Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox Virginia Tech

Outline n n Motivation Related Work n n The Digital Library Standardized Log Format

Outline n n Motivation Related Work n n The Digital Library Standardized Log Format n n Problems with existing DL logs DL log standard design DL Log format structure DL log tool and its implementation Conclusions and future work

Motivation n Log analysis n Source of information about: n n Used to: n

Motivation n Log analysis n Source of information about: n n Used to: n n n How patrons really use DL services How systems behave while supporting user information seeking activities Examples: patterns Evaluate Enhance services Help and design user interfaces Better allocation of resources Common practice in the web setting n Supported by web servers, proxy caching

Motivation (cont. ) n DLs differ from the web n n n Þ DL

Motivation (cont. ) n DLs differ from the web n n n Þ DL Logging should offer much richer information and opportunities Þ n DL collections are explicitly organized, described, managed, and preserved Users with more specific tasks and needs Digital objects and collections more structured Tradeoff : user privacy Current DL logs n n Differences in formats and recorded information Problems: n Lack of interoperability n No reuse of analysis tools n Comparability of log analysis results

Related Work n Web Servers (Common Log Format) n Focused in browsing, stateless bbn-cache-3.

Related Work n Web Servers (Common Log Format) n Focused in browsing, stateless bbn-cache-3. cisco. com - - [22/Oct/1998: 00: 21 -0400] "GET /~harley/courses. html HTTP/1. 0" 200 1734 bbn-cache-3. cisco. com - - [22/Oct/1998: 00: 22 -0400] "GET /~harley/clip_art/word_icon. gif HTTP/1. 0" 200 1050 www 4. e-softinc. com - - [22/Oct/1998: 00: 27 -0400] "HEAD / HTTP/1. 0" 200 0 user-38 ldbam. dialup. mindspring. com - - [22/Oct/1998: 00: 20: 48 -0400] "GET /~lhuang/junior/capehatteras. html HTTP/1. 0" 200 328 user-38 ldbam. dialup. mindspring. com - - [22/Oct/1998: 00: 20: 48 -0400] "GET /~lhuang/junior/PB 2 panforringed. mirror. gif HTTP/1. 0" 200 20222 eger-dl 01. agria. hu - - [22/Oct/1998: 00: 20: 51 -0400] "GET /~tjohnson/pinouts/ HTTP/1. 0" 200 26994

Related Work (cont. ) n DL- Greenstone ADMINISTRATION 37 /fast-cgi-bin/niupepalibrary (a) its-www 1. massey.

Related Work (cont. ) n DL- Greenstone ADMINISTRATION 37 /fast-cgi-bin/niupepalibrary (a) its-www 1. massey. ac. nz (b) [Thu Dec 07 23: 47: 00 NZDT 2000] (c) (a=p, b=0, bcp=, beu=, c=niupepa, cc=, ccp=0, ccs=0, cl=, cm=, cq 2=, d=, er=, f=0, fc=1, gc=0, gg=text, gt=0, h=, h 2=, hl=1, hp=, il=l, j=, j 2=, k=1, ky=, l=en, m=50, n=, n 2=, o=20, p=home, pw=, q 2=, r=1, s=0, sp=frameset, t=1, ua=, uan=, ug=, uma=listusers, umc=, umnpw 1=, umnpw 2=, umpw=, umug=, umun=, umus=, un=, us=invalid, v=0, w=w, x=0, z=130. 123. 128. 4950647871) (d) "Mozilla/4. 08 [en] (Win 95; I ; Nav)"

Relate Work (cont. ) n Search Engine - Open. Text Mon Sep 28 17:

Relate Work (cont. ) n Search Engine - Open. Text Mon Sep 28 17: 48: 42 1998 ----- Starting Search ----Mon Sep 28 17: 48: 42 1998 {Transaction Begin} Mon Sep 28 17: 48: 42 1998 {Rank. Mode Relevance 1} Mon Sep 28 17: 48: 42 1998 "Bacillus thuringiensis " Mon Sep 28 17: 48: 42 1998 P 0 = "Bacillus thuringiensis " Mon Sep 28 17: 48: 42 1998 R = (*D including (*P 0)) Mon Sep 28 17: 48: 42 1998 R = (((*R rankedby *P 0))) Mon Sep 28 17: 48: 42 1998 S = (subset. 1. 10 (*R)) Mon Sep 28 17: 48: 42 1998 SL 0 = (region "OTSummary" within. 1 (*S)) Mon Sep 28 17: 48: 42 1998 (*SL 0 within. 1 ( subset. 1. 1 *S )) Mon Sep 28 17: 48: 42 1998 (*SL 0 within. 1 ( subset. 2. 1 *S )) Mon Sep 28 17: 48: 42 1998 {Transaction End}

Related Work (cont. ) n Problems with existing DL logs n n n n

Related Work (cont. ) n Problems with existing DL logs n n n n Incompatibility Incompleteness Complexity of analysis Lack of organization Ambiguity Inflexibility Verboseness

The Digital Library Standardized Log Format n n n n Comprehensive Reflective of the

The Digital Library Standardized Log Format n n n n Comprehensive Reflective of the actual DL system behavior Easily readable Precise Flexible to accommodate in varying systems Succinct enough to be implemented Concern: user privacy

The Digital Library Standardized Log Format- Design (cont. ) n Capture high level user

The Digital Library Standardized Log Format- Design (cont. ) n Capture high level user and system behaviors n n Hierarchical organization Encapsulated in transactions n n n 1. 2. 3. 4. Interactions between the users and the system or among the system components Log format designed to record a number of different kinds of transactions Examples: Login to the system Submission of search query Browsing a result list Recording of a user failure

The Digital Library Standardized Log Format- Design (cont. ) n Design n n Reflective

The Digital Library Standardized Log Format- Design (cont. ) n Design n n Reflective of DL behavior Based on the 5 S formal theory n Unifying, mathematical theory to formally describe the semantics of DL components n Guidance for how to organize the log structure

The Digital Library Standardized Log Format - Design (cont. ) 5 S Definition Use

The Digital Library Standardized Log Format - Design (cont. ) 5 S Definition Use in Log Design Streams Represent static and dynamic multimedia content Temporal events, types of digital objects Structures Labeled directed graphs; Structured documents and metadata; structured provide organization within the searches, collection, metadata catalog; DL hypertext, classification scheme Spaces Sets, properties and operations on those sets Retrieval mode, Presentation information, Scenarios sequences of events that modify states of a computation in order to accomplish some functional requirement. Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios. Societies Sets of communities and relationships among them User information

The Digital Library Standardized Log Format (cont. ) n Specification n Collection of extensive,

The Digital Library Standardized Log Format (cont. ) n Specification n Collection of extensive, flat set of attributes update catalog event session help query collection transaction timestamp response Result cutoff search Machine information search registering Sorting rule error browse action

The Digital Library Standardized Log Format - Specification n Organization in structured logical way

The Digital Library Standardized Log Format - Specification n Organization in structured logical way n XML- XML Schema n Standard syntax n Guarantee quality, correctness n Rich set of basic types help standardization n Abundance of XML parsers helps construction of analysis tools

The Digital Library Standardized Log Format - Structure n Top Level Hierarchy Log. .

The Digital Library Standardized Log Format - Structure n Top Level Hierarchy Log. . . Log Entry Transaction . . . Statement Session. Id Time. Stamp Machine. Info

The Digital Library Standardized Log Format – Structure (cont. ) n Decomposition of statement

The Digital Library Standardized Log Format – Structure (cont. ) n Decomposition of statement into different types Statement Error. Info Session. Info Help. Info Register. Info Event Adm. Info

The Digital Library Standardized Log Format – Structure (cont. ) n Decomposition of event

The Digital Library Standardized Log Format – Structure (cont. ) n Decomposition of event Statement Error. Info Session. Info Event Help. Info Adm. Info Register. Info Action Search Browse Status. Info Update Store. Sys. Info

The Digital Library Standardized Log Format – Structure (cont. ) n Search Attributes Search

The Digital Library Standardized Log Format – Structure (cont. ) n Search Attributes Search Time. Frame Collection Presentation. Info Catalog Search. By Query. String Format Sort. By Number. Of. Results Cut. Off

DL Log Tool and Implementation n Java classes n n XMLLog. Data: store data

DL Log Tool and Implementation n Java classes n n XMLLog. Data: store data XMLLog. Manager: methods to read and write log information according to the format n n Middleware for plug-in DL tool to target system n n Synchronized read and writes: avoid conflicts and inconsistencies Events based on target system architecture and implementation Implemented in the MARIAN DL system

DL Log Tool and Implementation (cont. ): the MARIAN DL system Distributed client communication

DL Log Tool and Implementation (cont. ): the MARIAN DL system Distributed client communication Webgate Structured logging Semantic network Management API Customization and personalization Query history User Interaction Layer Searcher community Fusion modules Multilingual support Database Layer Generalized inverted index interfaces Tailored DL Infrastructure generation Database management API Data Analysis, Collection Builders & Loading Tools Semantic networks persistent storage DL Information networks characterization, indexing and loading

DL Log Tool and Implementation (cont. ) DL patron User event c 1 c

DL Log Tool and Implementation (cont. ) DL patron User event c 1 c 2 System event Log middleware DL analyst Analysis request result MARIAN User Layer Analysis tool write. Log. Entry (parameters) XMLLog. Manager storelog. Data (parameters) get. Log. Data (parameters) log. Data XMLLog. Data

DL Log Tool and Implementation (cont. ) n Example 1: Login to the system

DL Log Tool and Implementation (cont. ) n Example 1: Login to the system <Transaction ID = "3452"> <Session. Id > 987654 usr 3 </Session. Id> <Session. Info> <Session. Start> Start </Session. Start> <Login. Info> <User. Id> mhabib <User. Id> </Login. Info> </Session. Info> <Time. Stamp> 2002 -05 -31 T 20: 10: 55. 000 -05: 00 </Time. Stamp> <Machine. Info> <IPAddress> 128. 173. 244. 56 <IPAddress> <Port> 8000 </Port> </Machine. Info> </Trans. Id>

DL Log Tool and Implementation n Example 2: query all Dirline records about “low

DL Log Tool and Implementation n Example 2: query all Dirline records about “low back pain”. . . <Event> <Action> <Search> <Collection>Dirline</Collection> <Object. Type>Community. Record</Object. Type> <Search. By>Search. By. Any. Parts</Search. By> <Search. Type>Non. Persistant</Search. Type> <Query. String>low back pain</Query. String> <Time. Frame> <Start. Time>2002 -05 -31 T 20: 11: 07. 000 -05: 00</Start. Time> <End. Time>2002 -05 -31 T 20: 11: 09. 000 -05: 00</End. Time> </Time. Frame> <Presentation. Info> <Format>List</Format> <Sort. By>By. Rank</Sort. By> <Number. Of. Results>217</Number. Of. Results> <Cutoff>20</Cutoff> </Presentation. Info>. . .

DL Log Tool and Implementation n Example 3: Browse an item of the ranked

DL Log Tool and Implementation n Example 3: Browse an item of the ranked list returned as an answer for the previous search <Transaction ID = "3456"> <Session. Id > 987654 usr 3 </Session. Id>. . . <Statement> <Event> <Action> <Browse> <Doc. ID> 5114 </Doc. ID> <Doc. Name>University of Washington School of Medicine Multidisciplinary Pain Center (UWPC) </Doc. Name>. . .

In conclusion n Analysis of current DL log formats n n Designed an XML-based

In conclusion n Analysis of current DL log formats n n Designed an XML-based log format standard for DL logging analysis n n Need for standardization, common practices, interoperable tools Captures a rich, detailed set of system and user behaviors. Implemented format in a log component tool n Connected to the MARIAN DL system

Future Work n n Build suite of Components for Evaluation Use log format and

Future Work n n Build suite of Components for Evaluation Use log format and tools to evaluate several projects n n n Networked Digital Library of Theses and Dissertations (NDLTD) CITIDEL Broadening the scope of use to other NSDL projects Extend and use log tool with other DL systems and architectures Consider user privacy issues Explore info for personalization

Future work n Crosswalks to other standards (e. g. CLF) n n More challenges

Future work n Crosswalks to other standards (e. g. CLF) n n More challenges n n “Not yet other standard” Distributed Logs Large settings Investigate compression issues to deal with XML verboseness Promote discussions: n Listserv: dl-log-l@listserv. vt. edu