UNIONE EUROPEA Digital Libraries on the Grid with
UNIONE EUROPEA Digital Libraries on the Grid with g. Library Jorge Sevilla Cedillo (jorge. sevilla@ct. infn. it) Istituto Nazionale di Fisica Nucleare – Catania 2
Introduction Grid architecture Digital Libraries g. Library features g. Library architecture Implementation details Future plans Outline 3
Introduction Share information around the world ◦ Grid infrastructures have huge capacity to store data. ◦ Metadata Catalogs provide metadata describing data. ◦ File Catalogs used to map logical names to replicas. Safe fragile documents ◦ Old manuscripts. Easy and fast Interface ◦ PHP classes for the business logic on server side ◦ AJAX/Smart. Client web interface on the client side. 4
Data Grid g. Lite is deployed in a Geographical distributed Computational Data Grid. ◦ Storage elements(SEs): provide uniform access to data storage resources. Single disks Large disks arrays Tape-based Mass Storage Systems. ◦ AMGA Metadata catalog, stores metadata describing the contents of Grid files, then users search for entries. ◦ LCG File Catalog (LFG): maps logical filenames to the physical locations of replicas a file stored in one or more SE. 5
Digital Libraries Collections are stored in digital formats Information retrieval system Available on internet or local storage Fast and easy access 6
Digital Libraries - Advantages No physical boundaries Round the clock availability Multiple/Concurrent user access Information retrieval Preservation and conservation Saving Physical Space on libraries Added value. Digital repair 7
Digital Libraries on de Grid technologies allow to run jobs and also manage data. Music, images, videos… files can be saved and retrieved to/from the grid. ◦ Replicas provide: Security Availability Speed 8
g. Library – Features Extensible, robust, secure and easy-to-use system to handle digital assets storage. ◦ By Digital Asset, we mean any kind of content and/or media represented as a computer file. Image, Videos, e-books… Easy web interface to search, organize and retrieve files on the Grid. Fast asset updating. Fine-grained authorization mechanism. 9
g. Library like i. Tunes is an easy interface to handle multimedia assets ◦ i. Pod software manager ◦ To play and organize digital music and video files 10
g. Library like i. Tunes g. Library implements a i. Tunes-like interface ◦ AMGA save metadata to organize files stored on the grid ◦ g. Library allows to store, organize, search and retrieve digital assets on a Grid environment Grouped by type: a list of specific attributes to describe each kind of asset to be managed by the system hierarchical (a child type shares and extend parent’s attributes) queried during searches 11
g. Library – Download Retrieve data from the grid, it is easy for nonexpert users. g. Library allows to retrieve data with only some clicks ◦ selection of a replica link from a list ◦ transfers can be handled over GSIFTP (news HTTPS too!) with X. 509 Grid Proxy/Cert Auth. N/Auth. Z 12
g. Library – Metadata Editing Users can change metadata information, with a simple form ◦ Edit Generic Attributes ◦ Edit Specific Attributes 13
Use Case: de Roberto De Roberto, an Italian writer of the XIX/XX century, born in Naples, but spending his life in Catania, has left to the humanistic community numerous works Those are made up of valuable and hard-to-manage pieces: manuscripts, typescripts, drafts with handwritten corrections, magazines, cuts, sketches, photos, etc. 14
Digitalization works De Roberto of manuscripts, typescripts, printed ◦ TIFF FILES, one per page, 600 dpi, about 100 MB for A 3 High resolution scans for in-depth examination ◦ Multipage PDF, one per work, 300 dpi, varying file sizes Overall examination of works ◦ 8000 scans, 3 Terabyte of disk space ◦ Different physical formats, A 3/A 4/custom size Embedded Metadata ◦ TIFF with embedded metadata to provide scans physical features and information about the content Image. With, Image. Height, XResolution, File. Size, Creation. Date, Modify. Date, Description, Keywords, Caption. Writer, Title, Author, Copyright Status, Copyright Notice 15
De Roberto Make those works accessible to humanistic research comunity Always on-line: 24 x 365 Available from everywhere Simple easy-to immediately find the desired document Document organization according the physical and semantic metadata ◦ Long-term preservation(digital preservation) ◦ ◦ Reliability of storage systems and repicas redundancy to achieve secure preservation 16
g. Library Architecture g. Library is built on top of g. Lite Data Management Services: ◦ SRM Storage Elements. ◦ LFC File Catalog. ◦ AMGA Metadata Services. LFC File Catalogue interactions: ◦ A PHP-LFC APIs wrapper has been employed to browse LFC virtual file system and retrieve replicas (SURL) from Logical File Names (LFN) 17
g. Library Architecture AMGA Metadata Services. ◦ Is used to archive and organize assets’ metadata and to answer users’ queries built on-the-fly by browsing interface. ◦ PHP APIs are used to interact with AMGA. ◦ Authentication is done through the on-the-client created user proxy. ◦ AMGA groups and ACLs are used to restrict access on metadata, allowing fine-grained authorization capabilities. 18
g. Library Architecture VOMS Server LFC File Catalogue 3. get role 4. find the right asset AMGA Metadata Catalogue SE Login applet 2. proxy transfer over HTTPS SE 1. Local proxy creation 5. proxy retrieved over HTTPS SE 6. direct transfer from SE 19
What is Smart. Client? Smart. Client is an AJAX framework. It provides: ◦ A zero-install DHTML/Ajax client engine. ◦ Rich user interface components and services ◦ Client-server databinding systems. Rich client application. ◦ High productivity interfaces to end users. Thin client application ◦ Run in the standard web browsers available on every computers 20
Smart. Client - Features Client-side Ajax Multi-platform Incremental upgrade ◦ Minimize server contact. Retrieve data from server asynchronously in the background. ◦ integrates with any server platform through standards -based approaches such as REST and WSDL web services. ◦ Components can be easy embedded in existing applitacions. ◦ Grid, forms…add without architectural changes. Object-oriented ◦ Provide Java. Script APIs with a true class system. ◦ To extend, custumize and create new Smart. Client components. 21
Ajax MVC Smart. Client - Features ◦ provides a standard model for server contact, simplifies to learn each other's code. ◦ loading and saving of data is performed Cache and reuse load data. Handle load on demand for high data volume application Performs operations within the browser(sort, filter) Metadata-driven ◦ Allows to use standard sources of metadata JB, XML or JSON Offline, desktop, mobile capable ◦ applications can reach the mobile world with no change in code 22
Smart. Client Architecture We use only client services. ◦ The first layer connects by https to the server, and takes the data. ◦ The next layer is about data binding, where the application takes information from the server and make operations with them. ◦ At the top, there is the graphical user interface, it’s interactive with users. 23
Front-end Its an intuitive web interface ◦ like “i. Tunes” browser, allows to find the right asset with a just a few mouse clicks. Implemented as a web application with Smart. Client ◦ can be used on any platforms. Used g. Library class, developed in php ◦ Implemented at INFN Catania 24
Front-end Schema integrated by glibrary class, web interface, and connection between glibrary class and smartclient. ◦ g. Library Class: use mdclient. php API. Done in php. SSL Connection through mdclient. php. Get data from catalog server. Methods for get collections, get entries attributes. . . Set data to catalog server, we can save modified entries. Methods to set attribute values. ◦ Glibrary_connection: written in php. Glibrary_connection takes RPC request data from smartclient and call the required glibrary methods. The web interface calls (RPC) this code then it returns data from glibrary. class in JSON format. [data: {“name”: ”Nome”, ”type”: ”text”}{“name”: ”Size”, “type”: ”int”}…]. 25
Front-end − Web interface: Implementation based in Smart. Client code. Use databiding, it can load on demand the information from server and keep in databound data type. This tool provides a UI where to show the databound data, to save changes in server data and to filter required data. Widget used: Tree. Grid to show hierarchical Collection. List. Grid to show attributes entries. Combo. Box and list. Grid to make filtering. Detail. View to show all details about a entry. Dynamic. Form to change and save data. 26
Future plans Meeting with CETA-CIEMAT people (Spain) ◦ Discuss g. Library as a service Design and implement an administration front-end for g. Library. ◦ To create and define new libraries for repository administrators 27
Demo 28
References Contact: jorge. sevilla@ct. infn. it, antonio. calanducci@ct. infn. it, lamusa@unict. it Prototype of the De Roberto Digital Repository: ◦ https: //glibrary. ct. infn. it/glibrary_new/arbol_class. html g. Library project homepage (currently under maintaince): ◦ https: //glibrary. ct. infn. it/ Papers: A. Calanducci, C. Cherubino, L. N. Ciuffo, D. Scardaci, “A Digital Library Management System for the Grid”, Fourth International Workshop on Emerging Technologies for Next-generation GRID (ETNGRID 2007) at 16 th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2007), GET/INT Paris, France, June 18 -20, 2007 (http: //etngrid. diit. unict. it/2007/index. html). B. A. Calanducci, C. Cherubino, L. N. Ciuffo, D. Scardaci, “g. Library: Digital Asset Management System for the Grid”, IEEE Hypermedia and Grid Systems Conference at 30 th Jubilee International Convention MIPRO, Opatija, Croatia, May 21 -25 2007 (http: //www. mipro. hr/) 29
Questions Thank you for the attention 30
- Slides: 29