1 Bit Dew Data management and distribution service
1 Bit. Dew : Data management and distribution service May 31/2011
Authors : Gilles Fedak, Haiwu He. Doctoral students : Lu lu, HUST Post. Doctoral fellows : Bing Tang, Wuhan's university of science and technology. Haiwu He. INRIA Master internship: Anthony Simonet, University of Bordeaux I Engineer: José Francisco Saray Villamizar. INRIA. (jose. saray@inria. fr) 2
3 Team Avalon : Algorithms and software architectures for service oriented platforms • • French institute of research in informatics and automatics (INRIA) University of Lyon. (Lab. of Paralelism)
4 Project partners and founding • • • HUST Argonne national Laboratory University of Illinois Urbana Champaign ANR (French research Agency) Mapreduce founding Babes-Bolyai din Cluj-Napoca, Romania
5 Outline • • • Desktop Grid Bit. Dew overview Bit. Dew Architecture Use case Bit. Dew Master Worker Tutorials
6 Desktop Grid • • • Distributed System Use computing, network and storage resources from idle desktop PC's, distributed over multiple LAN & Internet. Useful in resource demanding applications – Seti@Home – LHC@Home
7 Desktop grids features • • Resources from individuals and institutions. – Low performance (not reliable storage, poor communications links). – No trust. Volatility (hosts abandon and join the system at any time). Resources shared between users and desktop grid software. Different administrative domains , different security mechanisms.
Challenge of Data Management in Desktop grids Motivation • Current data management solutions in distributed computing relies on adhoc solutions. • Some of the requirements of Data management: – Data replica management – Data fault tolerance – Data scheduling – Data life cycle (creation, placement, deletion) 8
Challenge of data management in desktop grids To deploy a new application in a cluster • • • Put the binary file on distributed file system (NFS) shared by cluster nodes Execute your application Log in remotely to each of nodes and delete the temporary files created by the application In a desktop grid • • Shared file systems are troublesome to setup (volunteer churn) Remote access to participants local file systems is forbidden (volunteers security and privacy) 9
10 “. . . A programmable environment for large scale data management and distribution. . . ”. G. Fedak et al. http: //www. bitdew. net
11 What is Bitdew ? • • Data Management and distribution service Toolbox for distributed data management • API to create, access, store and place data in distributed computing infrastructures, even in highly dynamic and volatile environments. • It can be integrated as a data manager in different middlewares – Xtremweb – BOINC
12 Bit. Dew features Automatic and transparent distributed data management • • Fault tolerance Replica management Data scheduling Data lifecycle Flexible service implementation • • • C/S architectures P 2 P Cloud computing Customizable: Layered architecture with independent components • • Code reuse Code personalization Open source , GPL
13 How does Bit. Dew achieve this ? • Data is labeled with attributes • Bit. Dew manage data requirements – Data replication – Data fault tolerance – Data scheduling – Data lifecycle (creation, placement deletion)
14 Which applications can benefit from Bit. Dew • Bag of tasks applications – Independent – Parameter sweeped – Large data sharing • • Master worker applications Map. Reduce applications
15 Bit. Dew Architectures Layered Architecture • • • Independent entities Components in upper layers interact with immediate lower layers Components in same layer do not interact
16 Bit. Dew architecture Advantages • You can easily change any of the layers / components – Customization – High Cohesion – Low coupling
17 API Level • • Provide space virtualization – Distributed space as one entity. – You can put/get Data. Three entities – Transfer Manager: non blocking interface to concurrent file transfers. – Bit. Dew: put/get data – Active. Data: – data placement between hosts according to attributes – Event Handling
18 Service level • • • Services implements Bitdew features SDK enhance customized services development. Currently 4 services : – Data repository DR – Data catalog DC – Data scheduler DS – Data transfer DT
19 Service level • • Stable Nodes – Service Layer runs here Volatile nodes – Offer resources to Bit. Dew, but they can join/abandon the network as soon as they want. – No sensitive information must be stored here A API S I A API Stable node Volatile node A: Application API S: Service I: Implementation A API
20 Backend Layer • Set of protocols and technologies currently implementing services. – DBMS – Mysql – Postgres – Embedded DBMS – HSQLDB – JPOX to handle persistence – FTP, HTTP, SCP, Bittorrent
21 Bit. Dew programming
22 Bit. Dew API • • Data – Bit. Dew's information grain – Used to contain file metadata – Name – Size – Checksum Attribute – Data Meta. Data – PROTOCOL: protocol used to distribute the file – FAULT_TOLERANCE: resilient to machine crash – REPLICA: number of copies of data I want in my system – AFFINITY: dependency between data – LIFETIME – Absolute : for how long the data should be stored ? – Relative (dependent on another data).
23 Event handlers • • Active Data feature Developpers can trigger code execution by associating handlers to data cycle. – Data creation – Data scheduled – Data deletion – Ex. Capture attributes of a scheduled data: public class My. Data. Event. Handler extends Active. Data. Handler { public void on. Data. Scheduled ( Data d , Attribute a ){ – Hashtable ht = new Hashtable(); – ht. put(d. getuid(), d); –. . –}
24 Example, replica Data Put (data, {oob: “ftp”, replica: “ 3”, lifetime: “ 60 s” }) Data Bit. Dew client volatile nodes stable nodes
25 Example, affinity put (data. B , { oob : “ftp”, affinity: “data. A”} ) put (data. A , { oob : “ftp” , replicat: 2} ) data A data B Bit. Dew client volatile nodes stable nodes data A data B
26 Bit. Dew Master/Worker BLAST project (Basic local alignment search tool) • • Compare one D. N. A sequence S with a database of sequences Retrieve all sequences similar to S above a certain threshold
Files Bit. Dew Attributes Application file - 4. 45 MB - highly shared Application. Attr = { oob: “bittorrent”, replica = -1 } N Sequence files - 1 MB - 1 file per task, - Sensitive file Sequence. Attr = { fault_tolerance = true, replica = 1, oob: “ftp” } Result file (reporting file) - To store all the workers results - Local to master Result. Attr = {pin: true} DB file - 2. 68 GB , highly shared - rule : Only nodes having a Sequence file are allowed to have DB DBAttr = { affinity : Sequence. Attr, protocol: bittorrent, lifetime = Result. Attr } Result file (per worker) - 1 MB - Must be send it back to the master RWAttr = {affinity: Result. Attr , oob: “ftp” } 27
28 Bitdew Master Worker in action 1. schedule (sequence_i , sequence. Attr) 2. schedule (DB, DBAttr) 3. schedule ( application, application. Attr ) 4. schedule (result , result. Attr ) sequence-i DB Applicatio n Result 1 Result 2 Result 3 Result 4 Master Workers Bitdew services Sequence 1 DB Applicatio n Result 1 Sequence 2 DB Applicatio n Result 4 Sequence 3 DB Applicatio n Result 2 Applicatio n Result 3
29 Conclusion • • • Bit. Dew is a framework to manage data movement and distribution transparently and declaratively. It can be deployed in different network architectures (client, server, p 2 p) Master/Worker and mapreduce models can be implemented with Bit. Dew It can be integrated into a Desktop grid solution to handle Data distribution and life cycle It's architecture allows extensions (new transfers, new services) and customization.
30 Questions ?
31 Tutorials Downlad them from • http: //www. bitdew. net/examples. html • Suggested order – Hello. World – Ping. Pong – Put. Get – Scp. Transfer – Callbackdn
32 Helloworld • • • Create one Data Put in on the scheduler with replica = -1 Each other client connecting to Bit. Dew will do the same
33 C B C C A A A B B B C Java Process Hello World ! Bitdew services B C
34 Ping, Pong ! O I I ! I O O Ping/Pong Process I O Ping data Pong data Bitdew services Event Handler, Erase the scheduled data
35 Put, Get Java Process File Bitdew services (dc, dr , dt, ds)
- Slides: 35