IADIS International Conference eSociety 2005 PERSONALIZED ACCESS TO

  • Slides: 28
Download presentation
IADIS International Conference e-Society 2005 PERSONALIZED ACCESS TO MULTI-VERSION DOCUMENTS FOR E-GOVERNMENT APPLICATIONS Fabio

IADIS International Conference e-Society 2005 PERSONALIZED ACCESS TO MULTI-VERSION DOCUMENTS FOR E-GOVERNMENT APPLICATIONS Fabio Grandi Maria Rita Scalas Alma Mater Studiorum - Università degli Studi di Bologna Federica Mandreoli Riccardo Martoglia Università degli Studi di Modena e Reggio Emilia

Overview § Our research activities concern the implementation of Web information systems for e-Government

Overview § Our research activities concern the implementation of Web information systems for e-Government applications § Development of e-Government initiatives: more and more on-line resources and services are being made available by Public Administrations (PAs) § We make use of temporal database and semantic Web techniques to provide personalized access to such resources and services § In particular, we consider multi-version norm texts and documents stored in Web repositories in XML format IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Importance of versioning § Temporal concerns are ubiquitous in the law domain: Original norm

Importance of versioning § Temporal concerns are ubiquitous in the law domain: Original norm text 1 new version 2 new version 3 time § A norm text changes in time due to subsequent modifications, but keeps its identity § The ability to model temporal dimensions is essential for the management of evolving norms § it is crucial to reconstruct the consolidated version of a norm § also past versions are still important IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Importance of versioning § Applicability (semantic) versioning also plays an important role § some

Importance of versioning § Applicability (semantic) versioning also plays an important role § some norms or some of their parts have or acquire a limited applicability § personalized version of the norm § A version only containing articles which are applicable to a citizen personal case Self-employed IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Objectives § Development of an effective and efficient Web information system where: § §

Objectives § Development of an effective and efficient Web information system where: § § norms are represented as XML documents dynamics of norms in time is captured limited applicability of norms is captured selective access and reconstruction of versions is supported by a query engine § Aimed at: § § enabling citizens to access personalized versions of multiversion resources improving and optimizing the involvement of citizens in the e-Governance process IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Approach § Definition of a temporal XML model including a temporal multiversion XML schema

Approach § Definition of a temporal XML model including a temporal multiversion XML schema temporal manipulation operations applicability extensions (semantic versioning) § Design, implementation and evaluation of system prototypes supporting the model § First system, based on “stratum” approach on top of a commercial DBMS § Ongoing research: second system, “native” approach § includes semantic annotations in multiversioning IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

The temporal XML data model § Based on XML Schema § Follows the hierarchical

The temporal XML data model § Based on XML Schema § Follows the hierarchical organization of norm texts § contents-section-article-paragraph § At each level of the hierarchy, the history of changes is represented by the versions produced: § § The temporal pertinence is represented by timestamps, i. e. temporal elements encoded as multiple 3 -dim intervals (TA) A reference to the modifying (active) norm is added (an_ref) § Supports ancestor-descendant inheritance § § Timestamps of a node are inherited by its descendants Along the hierarchy, redefinitions can only involve a restriction of the temporal pertinence IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

The temporal XML schema Num – R Type – R Law Title Contents An_ref

The temporal XML schema Num – R Type – R Law Title Contents An_ref – O Num – R Ver TA Section An_ref – O Num – R TA Heading An_ref – O Num – R Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O time of publication on the Official Journal Validity time Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O time the norm is in force Efficacy time Num – R Article An_ref – O Num – R Publication time Num – R Ver Heading 4 Temporal Dimensions: Publication – R Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O Ver TA Paragraph Ver Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O Num – R TA Vt_Start – R Vt_End – O Tt_Start – R Tt_End – O Et_Start – R Et_End – O time the norm can be applied Transaction time the norm is stored in the system IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

An example document <norm num="2624/1999" type="Law"> <title>Cereals Importation</title> <contents publication="2001 -01 -01" vt_start="2001 -01

An example document <norm num="2624/1999" type="Law"> <title>Cereals Importation</title> <contents publication="2001 -01 -01" vt_start="2001 -01 -01" tt_start="2001 -01 -10" et_start="2001 -01 -01" > … <article num="1"> <ver num="1"> <ta/ vt_start="2001 -01 -01" tt_start="2001 -01 -10" tt_end="2001 -06 -01" et_start="2001 -01 -01" > <ta/ vt_start="2001 -01 -01" et_end="2001 -06 -10" … > <ta/ vt_start="2001 -01 -01" vt_end="2001 -06 -10" et_start="2001 -06 -10" … > <paragraph num="1"><ver num="1" > …Art. 1 before modification… </ver> </paragraph> … </ver> <ver num="2" an_ref="LD 135/2000" > <ta/ vt_start="2001 -06 -10" tt_start="2001 -06 -01" et_start="2001 -06 -10" > <paragraph num="1"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> <paragraph num ="2"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> </ver> </article> IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications …

The “Stratum” approach Based on two components: § XML document management facilities offered by

The “Stratum” approach Based on two components: § XML document management facilities offered by Oracle 9 i § document-size granularity § structural and textual constraints § software stratum built on top § temporal aspects § reconstruction Extensive experimental results on the system behavior show: § good performance § ability to manage large collections of XML multi-version documents IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Query and modification operators § Full search and reconstruction functionalities FOR $a IN path

Query and modification operators § Full search and reconstruction functionalities FOR $a IN path WHERE constraints on $a RETURN const-tree(document($a), temporal specs) § constraints can contain keyword-based text selections § const-tree § operator for the reconstruction of a temporally consistent norm version (consolidated act; involves temporal selections) § temporal specs may involve a temporal predicate for each of the supported dimensions § Two basic operators for the management of norm modifications: § to change the textual contents of a norm portion § deletion, insertion, replacement of (a part of) the norm § to modify the temporal pertinence of a given version § time extension or suspension of (a part of) the norm IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Example of reconstruction (current version) <norm num="2624/1999" type="Law"> <title>Cereals Importation</title> <contents publication="2001 -01 -01"

Example of reconstruction (current version) <norm num="2624/1999" type="Law"> <title>Cereals Importation</title> <contents publication="2001 -01 -01" vt_start="2001 -01 -01" tt_start="2001 -01 -10" et_start="2001 -01 -01" > … <article num="1"> <ver num="1"> <ta/ vt_start="2001 -01 -01" tt_start="2001 -01 -10" tt_end="2001 -06 -01" et_start="2001 -01 -01" > <ta/ … > <paragraph num="1"><ver num="1" > …Art. 1 before modification… </ver> </paragraph> … </ver> <ver num="2" an_ref="LD 135/2000" > <ta/ vt_start="2001 -06 -10" tt_start="2001 -06 -01" et_start="2001 -06 -10" > <paragraph num="1"><ver num="1"> …Art. 1 after modification… <ver> </paragraph> <paragraph num ="2"><ver num="1"> …Art. 1 after modification… </ver> </paragraph> </ver> </article> … ( NOT INCLUDED ) IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

The “Native” approach Based on a Temporal XML Query Processor: § provides all the

The “Native” approach Based on a Temporal XML Query Processor: § provides all the temporal, structural, textual and applicability query facilities in a single component § exploits ad-hoc data structures and algorithms § finer granularity (“tuple”) § embedded “light” DBMS libraries § structural joins algorithms § allows users to store and reconstruct on-the-fly XML norm texts satisfying the four types of constraints IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Semantic versioning § Extension of the multi-version model based on temporal dimensions to include

Semantic versioning § Extension of the multi-version model based on temporal dimensions to include a semantic versioning dimension § Aimed at providing personalized access to norms wrt applicability § Civic ontology: a classification of citizens based on the distinctions introduced by successive norms (founding acts) that imply some limitations in their applicability IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Semantic versioning § At this stage of the project, we manage “tree-like” ontologies §

Semantic versioning § At this stage of the project, we manage “tree-like” ontologies § § class taxonomies induced by the IS-A relationship we exploit the pre-order and post-order properties of trees § New versioning dimension § Applicability of different parts of a norm text to the relevant classes of the civic ontology IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Semantic versioning § Applicability is inherited by descendant nodes unless locally redefined § By

Semantic versioning § Applicability is inherited by descendant nodes unless locally redefined § By means of redefinitions we can also introduce, for each part of a document, complex applicability properties § Extensions with respect to ancestors § Restrictions with respect to ancestors IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Example of full search § John Smith is a self-employed citizen. § He is

Example of full search § John Smith is a self-employed citizen. § He is interested in the text of all the norms. . . §. . . which contain paragraphs dealing with health care, . . . §. . . which were valid and in effect between 2002 and 2004, . . . §. . . and which are applicable to his case (civic class 7). Structural constraint Textual constraint 4 orthogonal constraints Temporal constraint Applicability constraint IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Example of full search FOR $a IN norm WHERE text. Constr ($a//paragraph//text(), ’health AND

Example of full search FOR $a IN norm WHERE text. Constr ($a//paragraph//text(), ’health AND care’) AND temp. Constr (’v. Time OVERLAPS PERIOD(’ 2002 -01 -01’, ’ 2004 -12 -31’)’) AND temp. Constr (’e. Time OVERLAPS PERIOD(’ 2002 -01 -01’, ’ 2004 -12 -31’)’) AND appl. Constr (’class 7’) RETURN $a Structural constraint Textual constraint 4 orthogonal constraints Temporal constraint Applicability constraint IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Finer storing granularity Each document is split into ad-hoc structures (tuples), providing a finer

Finer storing granularity Each document is split into ad-hoc structures (tuples), providing a finer access granularity to optimize time and space requirements Tuple ( id, < structural attributes > < temporal attributes > < text > < appicability attributes > ) Each constraint is verified at query time on the corresponding attributes IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Finer storage granularity ID structural attributes temporal attributes text applicability attributes … … …

Finer storage granularity ID structural attributes temporal attributes text applicability attributes … … … start. Pos level 4 4 text Health care … AA 3 vt. Start vt. End et. Start et. End tt. Start tt. End pt 01/01/1980 F F 20/12/1979 UC 15/12/1979 01/01/1980 IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Example of full search Civic ontology Normative DB Norm Article 1 Article 2 TA

Example of full search Civic ontology Normative DB Norm Article 1 Article 2 TA Ver 1 AA=3 Par 1 … norm//paragraph//text() … ‘class 7’ … Ver 1 Health care… …text X Par 2 TA AA=4 Ver 2 TA AA Health care… …text Y Ver 1 TA AA=3, 8 Health care… …text Z IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

“Native” approach benefits § The native approach is able to access and retrieve only

“Native” approach benefits § The native approach is able to access and retrieve only the strictly necessary data § selection relies on ad-hoc and temporally-enhanced structures § uses finer granularity of managed data wrt standard XML engines § Only the parts which satisfy the temporal and applicability constraints are used for the reconstruction of the retrieved documents § There is no need to retrieve whole XML documents and build spaceconsuming structures such as DOM trees to manipulate them, as required in the stratum approach Enhanced query processing efficiency Reduced memory requirements IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Evaluation benchmark § Three XML document sets § § § 5000 documents 10000 documents

Evaluation benchmark § Three XML document sets § § § 5000 documents 10000 documents 20000 documents (120 MB) (240 MB) (480 MB) § Variable document size § § § min = 2 KB avg = 24 KB max = 125 KB § Five different query types § Queries on keywords (structural + textual constraints) § Q 1 – keywords in contents § Q 2 – keywords in type and contents § Temporal queries (structural + temporal constraints) § Q 3 – conditions on publication, validity and transaction time § Mixed queries (structural + textual + temporal constraints) § Q 4, Q 5 – with keywords and temporal conditions § Five variants with personalized access support § Qx-A – with additional applicability constraints IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Performance evaluation § The selectivity of the query predicates strongly influences the performance of

Performance evaluation § The selectivity of the query predicates strongly influences the performance of the stratum approach § Q 2, Q 3: large amounts of documents containing some (typically small) relevant portions have to be retrieved § The native approach shows to be faster and more reliable in all cases § Performance is more uniform § Retrieval of useless document parts is avoided IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Performance evaluation § Very high efficiency in solving personalization queries § The system manages

Performance evaluation § Very high efficiency in solving personalization queries § The system manages applicability-based personalized access by means of simple comparisons involving pre/post encodings § 0. 5 -1% slower than the original versions § 3 -4% storage space overhead required IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Performance evaluation time 1046 msec 5000 docs 1366 msec 10000 docs 1741 msec 20000

Performance evaluation time 1046 msec 5000 docs 1366 msec 10000 docs 1741 msec 20000 docs § Scalability tests § The answer time grows sublinearly with the number of documents § Good scalability of the system in every type of query context IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Conclusions § We presented our research work concerning the design and implementation of efficient

Conclusions § We presented our research work concerning the design and implementation of efficient Web-based information systems for e. Government applications § We developed a first platform (“stratum” approach) for temporal management of multi-version norm texts on top of a commercial DBMS § We migrated such a system towards a more efficient platform (“native” approach) for which a specialized Temporal XML Query Processor has been designed and implemented § The new prototype provides for advanced functionalities § personalized access to documents on the basis of the digital identity of citizens relying on semantic versioning § We proved our approach to be very efficient in a large set of experimental situations and showed excellent scale-up figures with varying load configurations IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications

Future Work § Extensions of the current framework § more advanced application requirements may

Future Work § Extensions of the current framework § more advanced application requirements may include a more sophisticated ontology definition § Development of a complete technological infrastructure usable in a large Web-based e-Government scenario § identification, classification and reconstruction services § Assessment of our developed systems in a concrete working environment § with real users § with a large repository of real legal documents IADIS e-society 2005 - Grandi Scalas Mandreoli Martoglia - Personalized Access to Multi--version Documents for E-Government Applications