Fedora Distributed data management SI 1 Mohamed Rafi
Fedora Distributed data management (SI 1) Mohamed Rafi DART – UQ
Outline of Work Package To enable Fedora to natively handle large datasets. Explore SRB integration at the storage level of the repository software. Facilitate distributed data management using Fedora. Data access integration Data model integration archival management Metadata compatibility Deliverables Working integration. Timeframe Sept 2006 2
Fedora 3
Current Storage Systems 4
Fedora Storage : Issues § Currently there is only a simple file implementation for storing content. § Fedora’s definition of Distributed data is distributed repository (not implemented yet) § Problem : If a single Digital object is to be made up of Datastreams from multiple datastores, the available options are : § Have external references § Ingest all data streams into one fedora file system § Maintain multiple repositories (not implemented) and extend fedora ‘External’ method to fetch data from another repository. 5
Dart Architecture 6
Storage Resource Broker (SRB) § SRB is a distributed data manger and brokers data from heterogenous data stores § Of particular interest to DART project are the following features. § Application Programming interfaces (API) exposed by SRB for client applications accessing the SRB server. Currently APIs are available in ‘java’ and ‘c’. § Meta Data managed by the MCAT catalogue § The global namespace mechanism and the ‘collection’ view applied to heterogenous data sources. § Authentication schemes and in particular the ‘Ticket’ abstraction. § Logical Resource concept which allow sharing and replication of data among multiple physical resources. § Support for large datasets, multiple storage devices and high speed parallel I/O. 7
SDSC Storage Resource Broker & Meta-data Catalog SRB Resource, User Defined Application C, C++, Linux I/O Unix Shell Java, NT Browsers Prolog Web Predicate SRB MCAT Dublin Core Archives HPSS, ADSM, HRM Uni. Tree, DMF File Systems Databases Unix, NT, Mac OSX Application Meta-data 8 Third-party copy Remote Proxies DB 2, Oracle, Sybase Data. Cutter
Distributed Datastreams § Use new Fedora-SRB module for accessing SRB. Store bases are limited to one collection. § Modify Fedora code to generate different storage paths for different DART specific mime types § Text/raw , data/curate, image/protein, etc. § Define SRB collection as a logical grouping of data stores suitable for storing the different mime types 9
Example § Digital Object Protein § Element 1 : Amino Acid Sequence – text file § Element 2 : Crystal/X-ray images - large binary file § Element 3 : 3 -Dim image - Special Image file, probably copyrighted § Element 4 : Simulation Results – Large data file § Element 5 : Related research Publications – Links to external sites § Collection hierarchy § Datastream_store (has the following sub-collections) § Text - Simple file System § Images – HPSS § Data - Some other storage System 10
Meta. Data Load SRB metadata dynamically into fedora object model. Every time the fedora FOXML object model is accessed; go thru each of its Data. Stream objects fetch the corresponding Data. Stream’s SRB data path. query SRB for the meta data associated with the dataset add the returned list to the object’s FOXML (i. e. , extproperty. Type). To do this modify implementations of Digital Object Reader interface to import SRB metadata, as soon as the digital object stream is created (by deserializing the content model). 11
Key Problems/Issues § DART test data § Related work packages § Co-ordination § Functional Overlap 12
Future Expansion/Work Plans § Modify fedora code § Dynamic Data stream paths § Metadata update § Test Data § Test and implement the integrated software. 13
- Slides: 13