Storlets Making Swift More Software Defined than Ever

  • Slides: 54
Download presentation
Storlets: Making Swift More Software Defined than Ever Paul Luse – Intel Hamdi Roumani

Storlets: Making Swift More Software Defined than Ever Paul Luse – Intel Hamdi Roumani – IBM Eran Rom - IBM

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview •

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview • The Storlets Openstack Project • Vision

Concept 10 s – 100 s PBs of Storage

Concept 10 s – 100 s PBs of Storage

Concept

Concept

Concept

Concept

Concept

Concept

Concept

Concept

Concept Wh at if com one put w atio ishes wit hou n ov

Concept Wh at if com one put w atio ishes wit hou n ov t o r tm e u r ovin that n a g it dat a aro und

Concept – Collocating Compute and Storage

Concept – Collocating Compute and Storage

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview •

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview • The Storlets Openstack Project • Vision

Use Cases – Data Preparation How many pictures were taken in Tokyo between Oct

Use Cases – Data Preparation How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400

Use Cases – Data Preparation How many pictures were taken in Tokyo between Oct

Use Cases – Data Preparation How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400 Semi Structured Text Binary Data embedding EXIF metadata

Use Cases – Data Preparation How many pictures were taken in Tokyo between Oct

Use Cases – Data Preparation How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400 Use Compute on the Storage side to extract the metadata as semi structured text

Use Cases – Data Preparation How many pictures were taken in Tokyo between Oct

Use Cases – Data Preparation How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400 Use Compute on the Storage side to extract the metadata as semi structured text The Perfect Match: Apache Spark Meets Swift https: //www. openstack. org/summit/openstack-paris-summit-2014/sessionvideos/presentation/the-perfect-match-apache-spark-meets-swift

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400 Num Location F-stop ISO Focallength Speed Make 1 Tokyo 2. 6 800 200 1/500 Nikon 10/27/15 2 Paris 5. 6 400 70 1/350 Canon 11/2/14 3 Atlanta 11 1600 55 1/200 Samsung 5/12/14 … … Date

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400 Num Location F-stop ISO Focallength Speed Make 1 Tokyo 2. 6 800 200 1/500 Nikon 10/27/15 2 Paris 5. 6 400 70 1/350 Canon 11/2/14 3 Atlanta 11 1600 55 1/200 Samsung 5/12/14 … … Date

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400 Use compute on the storage side to perform the necessary filtering close to the data

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between

Use Cases – Predicate Push Down How many pictures were taken in Tokyo between Oct 27 th – 29 th, where the ISO used was 400 Use compute on the storage side to perform the necessary filtering close to the data Significant reduction in the overall time taken to process the query measured in very initial results

Use Cases – “Data Security”

Use Cases – “Data Security”

Use Cases – “Data Security”

Use Cases – “Data Security”

Use Cases – “Data Security” Download a transformed version of the design that fits

Use Cases – “Data Security” Download a transformed version of the design that fits my home printer

Use Cases – “Data Security” De-Identify

Use Cases – “Data Security” De-Identify

Use Cases – “Data Security” Forget. IT https: //www. youtube. com/watch? v=3 r. Xe.

Use Cases – “Data Security” Forget. IT https: //www. youtube. com/watch? v=3 r. Xe. Nbps 8 wo

Use Cases – Media Workflow in the Cloud Employ new feature extraction algorithms on

Use Cases – Media Workflow in the Cloud Employ new feature extraction algorithms on existing data Docker Meets Swift: A Broadcaster’s Experience http: //superuser. openstack. org/articles/docker-meets-swift-a-broadcaster-s-experience

The Super Use Case – Dynamically Extend the Generic Functionality of the Storage System

The Super Use Case – Dynamically Extend the Generic Functionality of the Storage System

The Super Use Case – Dynamically Extend the Generic Functionality of the Storage System

The Super Use Case – Dynamically Extend the Generic Functionality of the Storage System

The Super Use Case – Dynamically Extend the Generic Functionality of the Storage System

The Super Use Case – Dynamically Extend the Generic Functionality of the Storage System

The new ‘Storlets’ Openstack project uses Docker to run user defined computations inside Swift

The new ‘Storlets’ Openstack project uses Docker to run user defined computations inside Swift so as to achieve all those use cases (and more). Ah yes, and your help is needed.

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview •

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview • The Storlets Openstack Project • Vision

Open. Stack* Swift: A Community Project • Core Open. Stack* Service ‑ One of

Open. Stack* Swift: A Community Project • Core Open. Stack* Service ‑ One of the original 2 projects ‑ 100% Python* ‑ ~ 40 K LOC application, > 80 K LOC test code • Vibrant community ‑ Top contributing companies for Kilo include: Swift. Stack*, Intel, Red. Hat *, IBM*, HP*, Rackspace*, Hitachi*, Fujitsu* ‑ 14 Swift Related Presentations – Paris Summit ‑ 19 Swift Related Presentations – Vancouver Summit ‑ 23 Swift Related Presentations – Tokyo Summit • Recent Notable Development Work Global Clusters Discoverable Capabilities Havana Fall ‘ 13 Ice. House Spring ‘ 14 Storage Policies Juno Fall ‘ 14 Erasure Codes Encryption Kilo Spring ‘ 15 Liberty Fall‘ 15

Open. Stack* Swift Overview • Uses container model for grouping objects with like characteristics

Open. Stack* Swift Overview • Uses container model for grouping objects with like characteristics object - Objects are identified by their paths and have user-defined metadata associated with them • Accessed via RESTful interface container - GET, PUT, DELETE • Built upon standard hardware and highly scalable - Cost effective, efficient 32 • Eventually Consistent - Designed for availability, partition tolerance Objects are organized with containers

What Open. Stack* Swift is Not • Distributed File System - Does not provide

What Open. Stack* Swift is Not • Distributed File System - Does not provide POSIX file system API support • Relational Database - Does not support ACID semantics • No. SQL Data Store - Not built on the Key-Value/Document/Column-Family model • Block Storage System - Does not provide block-level storage service Not a “One Size Fits All” Storage Solution

Open. Stack* Swift Software Architecture Proxy Nodes wsgi server middleware swift proxy wsgi application

Open. Stack* Swift Software Architecture Proxy Nodes wsgi server middleware swift proxy wsgi application Storage Nodes wsgi server middleware swift object wsgi application swift account wsgi application swift container wsgi application

Putting it All Together Upload Download Clients RESTful API, Similar to S 3 Access

Putting it All Together Upload Download Clients RESTful API, Similar to S 3 Access Tier • • Obj A Load Balan cer Handle incoming requests Handle failures, ganged responses Scalable shared nothing architecture Consistent hashing ring distribution Proxy Storag e Storag Copy 3 e Storag e Capacity Tier • • Actual object storage Variable replication count Data integrity services Scale-out capacity Storag e Storag Copy 1 e Zone 1 e Storag e e Zone 2 Storag e Zone 3 Copy 2 Zone 4 Zone 5 Auth Service

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview •

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview • The Storlets Openstack Project • Vision

Storlets Overview • A Storlet is compiled code, deployed to Open. Stack* Swift as

Storlets Overview • A Storlet is compiled code, deployed to Open. Stack* Swift as an ordinary data object and when triggered is executed by the Storlet Engine directly on Swift nodes • A Storlet is standard Java* code ‑ Additional language bindings to be added • Storlets are triggered via extended Open. Stack Swift REST API calls • A Storlet runs inside a Docker* container, providing multitenancy and isolation • The Storlet Engine is software that invokes a Storlet, runs it in a Docker image, and connects input and output streams to Open. Stack Swift objects Access Tier Proxy Capacity Tier Storag e

* Open. Stack Swift Software Architecture Proxy Nodes wsgi server Swift middleware ffmpeg Storlet

* Open. Stack Swift Software Architecture Proxy Nodes wsgi server Swift middleware ffmpeg Storlet Stuff Storlet middleware Swift middleware swift proxy wsgi application Ubuntu 14. 04 Proxy

Open. Stack* Swift Software Architecture Object Nodes wsgi server Swift middleware ffmpeg Storlet Stuff

Open. Stack* Swift Software Architecture Object Nodes wsgi server Swift middleware ffmpeg Storlet Stuff Storlet middleware Swift middleware swift object wsgi application Ubuntu 14. 04 Object

Upload Compute to Swift • Compile and test java code • Create JAR file

Upload Compute to Swift • Compile and test java code • Create JAR file for code • Upload JAR file to storlet container in swift • Need to set a few headers (e. g. X-Object-Meta_storlet-Main, …) • Small dependencies can be uploaded as well (larger ones should modify docker image) Setting up and executing Storlets

Object “PUT” Data Flow Handled on the Open. Stack* Swift proxy node ffmpeg Storlet

Object “PUT” Data Flow Handled on the Open. Stack* Swift proxy node ffmpeg Storlet Stuff Ubuntu 14. 04 1 Intercept request by the Storlet middleware (Using X-Run-Storlet: … header) The proxy PUT Flow continues with the data stream from the Storlet 6 2 Identify the account container and invoke the Storlet daemon 3 Pass input and output FDs 5 Run the Storlet code, and write the output to the output streams Setting up and executing Storlets 4

Object “GET” Data Flow Handled on the Open. Stack* Swift object or proxy node

Object “GET” Data Flow Handled on the Open. Stack* Swift object or proxy node ffmpeg Storlet Stuff Ubuntu 14. 04 1 Intercept request by the Storlet middleware (Using X-Run-Storlet: … header) The object/proxy get flow continues with the data stream from the Storlet 6 2 Identify the account container and invoke the Storlet daemon 3 Pass input and output FDs 5 Run the Storlet code, and write the output to the output streams Setting up and executing Storlets 4

Storlets Architecture

Storlets Architecture

Hands-on: Writing a Storlet @Override public void invoke(Array. List<Storlet. Input. Stream> in. Streams, Array.

Hands-on: Writing a Storlet @Override public void invoke(Array. List<Storlet. Input. Stream> in. Streams, Array. List<Storlet. Output. Stream> out. Streams, Map<String, String> parameters, Storlet. Logger logger) throws Storlet. Exception { Storlet. Input. Stream sis = input. Streams. get(0); Storlet. Object. Output. Stream storlet. Object. Output. Stream = (Storlet. Object. Output. Stream)output. Streams. get(0); storlet. Object. Output. Stream. set. Metadata(sis. get. Metadata()); … play with input / output … } Writing Storlets can not be more simple

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview •

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview • The Storlets Openstack Project • Vision

Getting Started with Storlets • Get a fresh Ubuntu 14. 04 (yes we will

Getting Started with Storlets • Get a fresh Ubuntu 14. 04 (yes we will … this) • Make sure you have a passwordless sudoer • Clone the code from Github • Runs the s 2 aio. sh script, which installs Swift and the Storlet engine • Can also be installed on existing cluster using Ansible scripts.

Storlets in Openstack

Storlets in Openstack

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview •

Agenda • Concept • Motivating Use Cases • Swift Overview • Storlets Overview • The Storlets Openstack Project • Vision

Vision

Vision

Vision

Vision

Vision

Vision

Vision Nova

Vision Nova

Vision Nova

Vision Nova

Vision Storlets == Openstack for Storage Embedded Computations?

Vision Storlets == Openstack for Storage Embedded Computations?