Spark on POWER Randy Swanberg DE Power Software

Spark on POWER Randy Swanberg DE, Power Software and Solutions

Apache Spark • Project from UC Berkley AMPLab • Most active Apache Project • Unified Analytics Platform – Combine streaming, graph, machine learning and sql analytics on a single platform – Simplified, multi-language programming model • In-Memory Design – Pipelines multiple iterations on single copy of Fast and general engine for large-scale data processing Spark SQL MLlib Streaming Graph. X Spark Core API R Scala SQL Python Java data in memory – Superior Performance • Natural Successor to Map. Reduce © 2014, 2015 International Business Machines Corporation 2

IBM Spark Technology Center • Spark Technology Center based in San Francisco • Analogous to prior IBM investments in Java (JTC) and Linux (LTC) • Manage IBM’s participation, contributions and influence to the Spark Community • IBM’s Center of Competency for Spark, enabling IBM products and services Spark Technology Center © 2014, 2015 International Business Machines Corporation 3

IBM Spark Strategy • • Ensure success of Spark as the open technology enabler for big-data and analytics solutions Become respected contributor to the Spark community • Advantage IBM Analytic Products • Become Preferred Systems for Spark in the Cloud and On-Premise • Exploit in IBM Analytic Products Make Spark Better Bring Business Value to our Clients with Spark Based Solutions 4 © 2014, 2015 International Business Machines Corporation Provide Best Spark Platform Provide Spark Cloud Service Bring Clients Business Value with Spark

Certify a standard “ODPi Core” set of open source Hadoop family projects with specific versions and patch levels Develop tools and methods to help solution providers to test applications against the ODPi Core. Contribute changes and fixes in the ODPi Core Hadoop family projects to the ASF using the ASF processes. http: //odpi. org/ © 2014, 2015 International Business Machines Corporation

ODPi Ecosystem Representation across the Hadoop ecosystem… • Hadoop distribution vendors • Software application providers • System integrators/consultants • Hardware vendors • Customers … who all believe in the need for a community-based effort to standardize Hadoop, which will lead to improved adoption © 2014, 2015 International Business Machines Corporation

IBM Open Platform with Apache Hadoop (4. 1) • 100% open source code – Commitment to currency: within four weeks of open source Kafka 0. 8. 2. 1 added to IOP Version updates (Hadoop 2. 7. 1, Spark 1. 4, Ambari 2. 1, HBase 1. 2, Hive 1. 1) • Free for production use – – Decoupled Apache Hadoop from IBM analytics and data science technologies Production support offering available – Available for POWER 8 Little Endian (LE) on Redhat RHEL 7. 1 IBM Open Platform with Apache Hadoop HDFS Map. Reduce Spark Hive Parquet Pig YARN Ambari HBase Flume Sqoop Solr/Lucene Kafka Slider Avro Knox Oozie Zookeeper Apache Open Source Components © 2014, 2015 International Business Machines Corporation

Spark. Bench v 1 – Created by IBM Research • Collection of Open Source Representative Spark Workloads – Streaming Calculate most popular Twitter tag from a Live Twitter stream Calculate active user and page view counts from a synthetic click stream – SQL Select, Aggregate and Join functions common in SQL workloads – Machine Learning Logistic Regression, Matrix Factorization and Support Vector Machine Algorithms Commonly used in data classification, prediction and recommendation systems – Graph Page. Rank - used by web search engines to rank pages based on links Singular Value Decomposition – provides quality recommendations Trianglecount – used by social apps to discover relationships © 2014, 2015 International Business Machines Corporation 8

POWER Advantages for Spark • Streaming and SQL benefit from High Thread Density and Concurrency • Processing multiple packets of a stream and different stages of a message stream pipeline • Processing multiple rows from a query • Machine Learning benefits from Large Caches and Memory Bandwidth • Iterative Algorithms on the same data • Fewer core pipeline stalls and overall higher throughput • Graph also benefits from Large Caches, Memory Bandwidth and Higher Thread Strength • Flexibility to go from 8 SMT threads per core to 4 or 2 • Manage Balance between thread performance and throughput © 2014, 2015 International Business Machines Corporation 9

Spark. Bench on POWER Results 4 3 2 1 0 Twitter Streaming Pageview Streaming SQL with Hive SQL with Native RDD 4 3 2 1 0 Page. Rank 2 Singular Value Decomposition++ Triangle. Count Matrix Factorization Support Vector Machine 1. 5 1 0. 5 0 Logistic Regression © 2014, 2015 International Business Machines Corporation 10

Power S 812 LC Optimized for Spark performance and price-performance Based on 10 Sparkbenchmarks • S 812 LC delivers better performance at a lower price than Intel Xeon E 5 -2690 v 3 systems – 1. 94 X BETTER performance (10 cores vs 24 cores) IBM S 812 LC 10 c/80 t HP DL 380 E 5 -2690 v 3 24 c/48 t – 20% LOWER TCA – 2. 3 X BETTER price-performance • • IBM S 812 LC All results are based on IBM Internal Testing of 10 Spark. Bench benchmarks consisting of SQL RDD Relation, Twitter, Pageview Streaming, Page. Rank, Logistic Regression, SVD++, Triangle. Count, SVM, MF, SQL Hive 10 c/80 t IBM Power System S 812 LC 10 cores / 80 threads, POWER 8; xxx. GHz, 256 GB memory, Ubuntu 15. 04, Spark 1. 4, Open. JDK 1. 8 Intel Xeon HP DL 380; 24 cores / 48 threads, E 5 -2390 v 3; 2. 3 GHz , 256 GB memory. Ubuntu 15. 04, Spark 1. 4, Open. JDK 1. 8 2014, S 82 LC 2015 International Business Machines Corporation Pricing is based on list prices of HP DL 380 and estimated prices of IBM©Power HP DL 380 E 5 -2690 v 3 24 c/48 t

Bringing Open. POWER Innovation to Spark • Project Tungsten is a new Spark Community Project to enable Spark optimization closer to the “metal” and exploit accelerators • Open. POWER technology can complement the Tungsten vision CAPI – Coherent Interface for Accelerators – Full virtual address translation from the Accelerator – Complete Kernel Bypass to Interact with the Accelerator CAPI FPGA – Compression, Sort, Hash-join, Erasure Codes, etc… RDMA – Network acceleration for Spark shuffles across the cluster CAPI Flash – Enhancing Spark’s In-Memory Design GPUs – Accelerating Spark Machine Learning © 2014, 2015 International Business Machines Corporation 12

CAPI Flash Prototype • Leverage CAPI Flash for Spark Spill Space • x. Degrees of Separation Workload Time (msec) 4 X less memory, equal performance Mem © 2014, 2015 International Business Machines Corporation 13

GPUs Under Spark - Prototype • We generate native code for GPU/SIMD from a Spark user program to exploit parallelism • We define array-based RDD and put it in off-heap for efficient access from GPU/SIMD, compared to current collection-based RDD Spark with GPU/SIMD Current Spark user Data accesses program Spark user program Data accesses Spark core execution engine job scheduler block manager Java Virtual Machine JIT compiler RDD Java heap Managed runtime Code job scheduler block manager Java Virtual Machine JIT compiler array-based RDD off heap Unmanaged Managed runtime Data © 2014, 2015 International Business Machines Corporation Native code Code Data SIMD 14

GPUs for Spark Machine Learning and Graph • GPU Enabled Libraries – – – BLAS Logistic Regression Alternating Least Squares Word 2 Vec ……etc…. . Spark SQL MLlib Streaming Graph. X Spark Core API R © 2014, 2015 International Business Machines Corporation Scala SQL Python Java 15

Adverse Drug Reaction Prediction Candidate DDIs of type 1 Features Drug 2 Sim Aspirin . 9 Dicoumarol Warfarin . 76 Sim. N … Salsalate Drug 1 Drug 2 Aspirin Gliclazide Aspirin Dicoumarol Known DDIs of type M Drug 1 Drug 2 Sim Drug 1 Drug 2 Salsalate Aspirin . 7 Aspirin Probenecid Dicoumaro l Warfarin. 6 Aspirin Azilsartan Similarity based Predictions Prediction Salsalate Gliclazide 0. 85 Salsalate Warfarin 0. 7 … Drug 2 DDIs of type M Prediction Drug 1 Drug 2 Prediction Salsalate Gliclazide 0. 53 Salsalate Warfarin 0. 32 Drug 2 Best Sim 1*Sim 1 … Best Sim. N*Sim. N Salsalate Gliclazide . 9*1 . 7*1 Salsalate Warfarin . 9*. 76 . 7*. 6 Candidate DDIs of type M Features Drug 1 Drug 2 Best Sim 1*Sim 1 … Best Sim. N*Sim. N Salsalate Gliclazide . 9*. 6 . 7*. 7 Salsalate Warfarin . 9*. 3 . 7*. 5 Reduce Overall Model Training Time 4 X DDIs of type 1 Prediction Drug 1 … Drug 1 Known DDIs of type 1 … Sim 1 (Chemical Similarity) Logistic Regression (rare event correction) © 2014, 2015 International Business Machines Corporation Accelerate Logistic Regression stages 80 X to 100 X Increase Quality of Prediction

Personalized Single Drug – Adverse Reaction Prediction Sim 1(Chemical Structure Sim) Drug 2 Sim Patient 1 Patient 2 Sim Salsalate Aspirin . 9 John Doe Mary Paul . 8 Dicoumarol Warfarin . 76 Jane Smith Chloe Roe . 6 Sim’M Known single drug-ADRs … … Drug 1 Sim. N EMRs Sim’ 1(Current Condition Sim) Patient Drug ADR Drug 1 Drug 2 Sim Patient 1 Patient 2 Sim J. Doe Aspirin Bleeding Salsalate Aspirin . 7 John Doe Mary Paul . 55 J. Doe Warfarin Paralysis Dicoumarol Warfarin . 6 Jane Smith Chloe Roe . 95 Candidate Drug-ADR Features Patient Drug ADR Best Sim 1* Sim’ 1 M. Paul Salsalate Bleeding . 9*. 8 0. 7*0. 55 . 76*. 8 0. 6*0. 73 Paul Dicourmarol Paralysis © 2014, M. 2015 International Business Machines Corporation … Best Sim. N* Sim’M

Spark-RDMA – Working prototype Spark Netty shuffle. io. mode=rsocket native sockets (epoll) OFED (rpoll/rselect)

Business Value IBM Data Engine for Analytics A fully integrated solution with software and infrastructure optimized for Big Data & Analytics ü Simplify operations – easy to deploy and manage Single vendor support ü Designed for mixed analytics workloads: streams, at rest, text Less than half storage infrastructure with only 1 copy of data* ü Optionally preloaded with IBM Big. Insights and IBM Open Platform ü Enterprise grade Hadoop with advanced resource and storage management Competitive performance with up to 50% less servers* Lowest $/TB and over a third more usable storage** ü Adapt and scale to your changing analytics needs Appliance-Like but much more Versatile! * Compared with a standard triple replica Hadoop configuration. ** List price vs full rack configuration vs Oracle Big Data Appliance with data connectors and Big. SQL Built on POWER 8: The Platform Designed for Big Data 4 X 4 X 5 X threads per core vs. x 86 (up to 1536 threads per system) memory bandwidth vs. x 86 (up to 16 TB of memory) more cache vs. x 86 (up to 224 MB cache per socket) © 2014, 2015 International Business Machines Corporation 19