StorageEvents and d Cache New Frontiers in Storage

  • Slides: 36
Download presentation
Storage-Events and d. Cache: New Frontiers in Storage Data Management for extreme scale computing

Storage-Events and d. Cache: New Frontiers in Storage Data Management for extreme scale computing Presenter: Patrick Fuhrmann Primary Author: Paul Millar With contributions by Michael Schuh, Jürgen Starek and the d. Cache Team e. Xtreme Data. Cloud is co-funded by the Horizon 2020 Framework Program – Grant Agreement 777367 Copyright © Members of the XDC Collaboration, 2017 -2020

Starting with some pre-requisits 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in

Starting with some pre-requisits 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 2

e. Xtreme Data. Cloud cheat sheet 8 partners 7 countries (DE, IT, ES, PL,

e. Xtreme Data. Cloud cheat sheet 8 partners 7 countries (DE, IT, ES, PL, NL, UK, FR) 7 research communities represented + EGI XDC Total Budget: 3. 07 M Euros XDC ( 27 Months) started Nov 1 st 2017 until Jan 31 st 2020 e. Xtreme Data. Cloud is a software development and integration project. Develops scalable technologies for federating storage resources and managing data in highly distributed computing environments. Focus efficient, policy driven and Quality of Service based DM The targeted platforms are the current and next generation e-Infrastructures deployed in European Open Science Cloud (EOSC) The e-infrastructures used by the represented communities 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 3

The d. Cache cheat sheet Highly scalable storage system for scientific communities. Combines heterogeneous

The d. Cache cheat sheet Highly scalable storage system for scientific communities. Combines heterogeneous storage nodes under a common virtual file system tree and scales into 100 PB region. Provides access to data via a variety of protocols, e. g. NFS 4. 1, Web. DAV, Grid. FTP, etc. Provides a variety of authentication mechanisms, like User/Pass, X 509 Certificates, Kerberos, in preparation SAML and Open. ID Connect, Macaroons. Multi Tier support: moves data around between different media types, like Tape, Spinning Disks and SSDs. By user request. Automatically based on the access profile, hot spot. Provides resiliency, e. g. through multiple automatically managed copies. Deployed at more than 60 production Big Data sites around the work with up to 150 Pbytes in total. 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 4

How storage is used currently. . 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New

How storage is used currently. . 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 5

How storage is used currently Stage a file Upload a file Request Queued OK

How storage is used currently Stage a file Upload a file Request Queued OK Is on DISK ? NO Download a file Is on DISK ? OK NO Booking done centrally by: Delete a file Permission denied * * * Is on DISK ? YES 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 6

What are the issues with this approach? 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache:

What are the issues with this approach? 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 7

Problem: The file registration/sync problem. Upload a file Time OK Storage Node Register file

Problem: The file registration/sync problem. Upload a file Time OK Storage Node Register file OK Rucio/LFC/… Delete a file OK Leads to • Dark Data • Dangling References 03/04/2019 Unregister file Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage OK 8

The polling problem. Stage file A, file B, . . . , file X

The polling problem. Stage file A, file B, . . . , file X Requests Queued Are files on DISK ? Yes, no, Yes. . . No Are files on DISK ? Yes, Yes. . . No * * * Are files on DISK ? Again he has to do the bookeeping Yes, . . . ERROR 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 9

New way of interacting: storage events 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New

New way of interacting: storage events 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 10

What are storage events? Storage Events File uploaded File moved to tape Tape SSD

What are storage events? Storage Events File uploaded File moved to tape Tape SSD Storage Media Transitions DISK 03/04/2019 File staged to disk Admin intervention Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 11

Storage Events: Two approaches Direct event delivery Brokered event delivery Event Topic A Subscribe

Storage Events: Two approaches Direct event delivery Brokered event delivery Event Topic A Subscribe to topic A Subscribe to event Event Topic A Event Topic B Event # 1 Event # 2 Event # 3 * * * Event Topic C Event Topic A Event Broker Subscription accepted Event Topic A Event Topic B Event # N 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 12

Storage Events: how would they help 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New

Storage Events: how would they help 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 13

Solution: File registration Time Upload a file Register file OK Storage Node OK Rucio/LFC/…

Solution: File registration Time Upload a file Register file OK Storage Node OK Rucio/LFC/… Delete a file Unregister file OK OK 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 14

Solution: The polling problem Stage file A, file B, . . . , file

Solution: The polling problem Stage file A, file B, . . . , file X Requests Queued File 4 arrived on DISK File 1 arrived on DISK File N arrived on DISK * * * File 7 arrived on DISK 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 15

Comparison: Events in industry… Amazon Lambda Image Thumbnail Creation Photo is uploaded into S

Comparison: Events in industry… Amazon Lambda Image Thumbnail Creation Photo is uploaded into S 3 bucket Lambda runs image resizing code to generate Web, mobile and tablet sizes Google Cloud Platform 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 16

Comparison: events in Open-Source. . . Apache Storm is a distributed stream processing computation

Comparison: events in Open-Source. . . Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Apache Ni. Fi is a software project from the Apache Software Foundation designed to automate the flow of data between software systems. Kubeless is a Kubernetes-native serverless framework Samza allows you to build stateful applications that process data in real-time from multiple sources including Apache Kafka. that lets you deploy small bits of code (functions) without having to worry about the underlying infrastructure. 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 17

d. Cache implementation 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage

d. Cache implementation 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 18

d. Cache Storage Events: Kafka and SSE Event Broker Site managed services 03/04/2019 Externally

d. Cache Storage Events: Kafka and SSE Event Broker Site managed services 03/04/2019 Externally managed service SSE Server-sent Events Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 19

Cheat sheet: Kafka vs SSE Standard … Software package Protocol d. Cache billing events

Cheat sheet: Kafka vs SSE Standard … Software package Protocol d. Cache billing events inotify Main benefit Easy integration Built-in security “Catch-up” storage Memory & disk Memory-only Site-level integration Events for users What events does it see? Target audience 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage (currently) 20

Use-cases and demonstrators 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage

Use-cases and demonstrators 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 21

Storage Events for the Eu-XFEL Stolen from XDC Review in Luxemburg Dynamic Processing Agent

Storage Events for the Eu-XFEL Stolen from XDC Review in Luxemburg Dynamic Processing Agent Storage Event Authentication (Macaroon) (Juergen Starek Michael Schuh) Event Broker Open Stack Kafka Open Whisk Faa. S Image Analytics Storage Event 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 22

Full XDC Layout using Storage Events API Calls XDC Orchestrator TOSCA Template INDIGO Orchestrator

Full XDC Layout using Storage Events API Calls XDC Orchestrator TOSCA Template INDIGO Orchestrator Flowable© Rucio File Transfer Management XDC Message Bus Job Scheduling Compute Cluster Alkjfa 245234 Laksdfak Akjaq 4 kalkdjf Askdfjasjkdf 03/04/2019 FTS Data Location Information Storage Node d. Cache Virtual Federated Storage Dynafed Storage Node d. Cache Storage Events Data Federation and Location Detection Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 23

INDIGO Orchestrator (SSE) Stolen from Marica Antonacci, INFN for XDC 03/04/2019 Fuhrmann/Millar Storage Events

INDIGO Orchestrator (SSE) Stolen from Marica Antonacci, INFN for XDC 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 24

03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 25

03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 25

Storage Events DEMO Dynamic Processing Agent Storage Event Authentication (Macaroon) Storage Event 03/04/2019 Event

Storage Events DEMO Dynamic Processing Agent Storage Event Authentication (Macaroon) Storage Event 03/04/2019 Event Broker Kafka Authenticated Data Stream Open Stack Open Whisk Faa. S Image Analytics Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 26

Upload the detector (phone) data d. Cache 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache:

Upload the detector (phone) data d. Cache 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 27

DEMO, on the detector (my mobile) d. Cache 03/04/2019 Fuhrmann/Millar Storage Events and d.

DEMO, on the detector (my mobile) d. Cache 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 28

DEMO, file system view (d. Cache, GUI) Original Movie 03/04/2019 De-Shaked Movie Combined Movie

DEMO, file system view (d. Cache, GUI) Original Movie 03/04/2019 De-Shaked Movie Combined Movie Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 29

Original and derived movie Original De-shaked 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New

Original and derived movie Original De-shaked 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 30

Original and derived movie Original 03/04/2019 De-shaked Fuhrmann/Millar Storage Events and d. Cache: New

Original and derived movie Original 03/04/2019 De-shaked Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 31

Future directions Add additional events, based on initial feedback Further explore automated data workflows

Future directions Add additional events, based on initial feedback Further explore automated data workflows (Eu-XFEL use case) Work with Rucio team to explore SSE integration. Work with d. Cache sites to deploy store events in production. 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 32

Thanks for listening 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage

Thanks for listening 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 33

03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 34

03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 34

This may cause problems …. Time Upload a file OK Register file Rucio/LFC/… Storage

This may cause problems …. Time Upload a file OK Register file Rucio/LFC/… Storage Node OK Delete a file OK Unregister file OK 03/04/2019 Leads to • Dark Data • Dangling References Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 35

DEMO, on the detector (my mobile) d. Cache 03/04/2019 Fuhrmann/Millar Storage Events and d.

DEMO, on the detector (my mobile) d. Cache 03/04/2019 Fuhrmann/Millar Storage Events and d. Cache: New Frontiers in Storage 36