Deep Learning with Tensor Flow and Spark using

  • Slides: 38
Download presentation
Deep Learning with Tensor. Flow and Spark using GPUS and Docker Containers Tom Phelan

Deep Learning with Tensor. Flow and Spark using GPUS and Docker Containers Tom Phelan Blue. Data CTO / HPE Fellow (Blue. Data was recently acquired by HPE) thomas. phelan@hpe. com, @tapbluedata

Agenda § Big Data, Artificial Intelligence, Machine Learning, and Deep Learning § Common Enterprise

Agenda § Big Data, Artificial Intelligence, Machine Learning, and Deep Learning § Common Enterprise Use Cases § Deep Learning Application Requirements - Distributed Tensor. Flow & Horovod in Containers with GPUs § Use of Docker Containers to meet Requirements - Challenges and Solutions § Lessons Learned § Key Takeaways and Recommendations

Big Data refers to data sets that are so voluminous and complex that traditional

Big Data refers to data sets that are so voluminous and complex that traditional data-processing application software inadequate to deal with them. Characterized by: volume, variety, velocity Source: https: //en. wikipedia. org/wiki/Big_Data

Let’s Get Grounded … What is AI? Deep learning (DL) Artificial intelligence (AI) Mimics

Let’s Get Grounded … What is AI? Deep learning (DL) Artificial intelligence (AI) Mimics human behavior. Any technique that enables machines to solve a task in a way like humans do. Deep learning Example: Subset of ML, using deep artificial neural networks as models, inspired by the structure and function of the human brain. Example: Self-driving car Siri Machine learning Artificial intelligence Machine learning (ML) Algorithms that allow computers to learn from examples without being explicitly programmed. Example: Google Maps

Game Changing Innovation Gartner 2019 CIO Agenda Q: Which technology areas do you expect

Game Changing Innovation Gartner 2019 CIO Agenda Q: Which technology areas do you expect will be a game changer for your organization? Source: Gartner, Insights From the 2019 CIO Agenda Report, by Andy Rowsell-Jones, et al.

Why should you be interested in AI / ML / DL? Everyone wants AI

Why should you be interested in AI / ML / DL? Everyone wants AI / ML / DL and advanced analytics…. AI and advanced analytics represent 2 of the top 3 CIO priorities …. but face many challenges AI and advanced analytics infrastructure could constitute 15 -20% of the market by 2021 IDC. Goldman Sachs. HPE Corporate Strategy. 2018 2 Gartner - “ 2019 CIO Survey: CIOs Have Awoken to the Importance of AI” 1 Use cases New roles, skill gaps Culture and change Enterprise AI adoption 2. 7 X growth in last 4 years 1 2 Data preparation Legacy infrastructure

Example Enterprise Deep Learning Use Cases Fraud Detection Medical Diagnosis Prediction Credit Cards Car

Example Enterprise Deep Learning Use Cases Fraud Detection Medical Diagnosis Prediction Credit Cards Car Loans Detection / Prediction of Cancer, Alzheimer’s, Pneumonia Weather Forecast Gas & Oil Location Buyer / Trader Action

AI / ML / DL Adoption in the Enterprise Financial services Government Energy Retail

AI / ML / DL Adoption in the Enterprise Financial services Government Energy Retail Fraud detection, ID verification Cyber-security, smart cities and utilities Seismic and reservoir modeling Video surveillance, shopping patterns Health Consumer tech Service providers Personalized medicine, image analytics Chatbots Media delivery Manufacturing Predictive and prescriptive maintenance

Financial Services Use Cases Wide Range of ML / DL Use Cases for Wholesale

Financial Services Use Cases Wide Range of ML / DL Use Cases for Wholesale / Commercial Banking, Credit Card / Payments, Retail Banking, etc. Fraud Detection • • • Real-Time Transactions Credit Card Merchant Collusion Impersonation Social Engineering Fraud Risk Modeling & Credit Worthiness Check • • • Loan Defaults Delayed Payments Liquidity Market & Currencies Purchases and Payments Time Series CLV Prediction and Recommendation • • • Historical Purchase View Pattern Recognition Retention Strategy Upsell Cross-Sell Nurturing Customer Segmentation • • • Behavioral Analysis Understanding Customer Quadrant Effective Messaging & Improved Engagement Targeted Customer Support Enhanced Retention Other • • Image Recognition NLP Security Video Analysis

Fraud Detection Use Case § One of the most common use cases for ML

Fraud Detection Use Case § One of the most common use cases for ML / DL in Financial Services is to detect and prevent fraud § This requires: - Distributed Big Data processing frameworks such as Spark - ML / DL tools such as Tensor. Flow, H 2 O, and others - Continuous model training and deployment - Multiple large data sets

ML / DL in Healthcare –Use Cases • Precision Medicine and Personal Sensing –

ML / DL in Healthcare –Use Cases • Precision Medicine and Personal Sensing – Disease prediction, diagnosis, and detection (e. g. genomics research) – Using data from local sensors (e. g. mobile phones) to identify human behavior • Electronic Health Record (EHR) correlation – “Smart” health records • Improved Clinical Workflow – Decision support for clinicians • Claims Management and Fraud Detection – Identify fraudulent claims • Drug Discovery and Development

360° View of the Patient Demographics Visit Labs Patient Rx Care Site Diagnosis Genomics

360° View of the Patient Demographics Visit Labs Patient Rx Care Site Diagnosis Genomics Studies

Sample Architecture & Applications Electronic Health Record Systems Kafka Connect Centralized Publisher Subscriber Hub

Sample Architecture & Applications Electronic Health Record Systems Kafka Connect Centralized Publisher Subscriber Hub Monitors / Devices Local Store Model Build Publishers Promotion Results / Feedback Speed Layer Database Access Secure HDFS Data Lake Model Score

Requirements for Deep Learning Q: What hardware is commonly used for deep learning and

Requirements for Deep Learning Q: What hardware is commonly used for deep learning and matrix multiplications? A: Graphics Processing Units (aka GPUs)

Matrix Multiplication Source: https: //www. khanacademy. org/math/precalculus/precalc-matrices/multiplying-matrices-by-matrices/a/multiplying-matrices

Matrix Multiplication Source: https: //www. khanacademy. org/math/precalculus/precalc-matrices/multiplying-matrices-by-matrices/a/multiplying-matrices

Deep Neural Network (DNN) Source: https: //www. kdnuggets. com/wp-content/uploads/deep-neural-network. jpg

Deep Neural Network (DNN) Source: https: //www. kdnuggets. com/wp-content/uploads/deep-neural-network. jpg

NVIDIA GPUs and CUDA § NVIDIA is the most common manufacturer of GPUs §

NVIDIA GPUs and CUDA § NVIDIA is the most common manufacturer of GPUs § CUDA is the software that allows applications to access the NVIDIA GPU hardware • CUDA library or toolkit • CUDA kernel device driver § GPU-hardware specific

Architectural Challenges § Complexity, lack of repeatability and reproducibility across environments § Sharing data,

Architectural Challenges § Complexity, lack of repeatability and reproducibility across environments § Sharing data, not duplicating data § Need agility to scale up and down compute resources § Deploying multiple distributed platforms, libraries, applications, and versions § One size environment fits none § Need a flexible and future-proof solution Laptop On-Prem Cluster Off-Prem Cluster

Cost Challenges § How to maximize the use of expensive hardware resources § How

Cost Challenges § How to maximize the use of expensive hardware resources § How to run clusters on heterogeneous host hardware - CPUs and GPUs, including multiple GPU versions § How to minimize manual operations - Automating the cluster creation and deployment process - Creating reproducible clusters and reproducible results - Enabling on-demand provisioning and elasticity

Support and Security Challenges § How to support the latest versions of software -

Support and Security Challenges § How to support the latest versions of software - Compatibility across version upgrades § How to ensure enterprise-class security - Network, storage, user authentication, and access § Applying security patches

Addressing The Challenges Simplify Deployments Innovate Faster Docker is software that performs operating-system-level virtualization

Addressing The Challenges Simplify Deployments Innovate Faster Docker is software that performs operating-system-level virtualization also known as containerization. Containerization allows the existence of multiple instances on a server. Source: https: //en. wikipedia. org/wiki/docker_(software) Deploy Anywhere

Surfacing GPU device into a Container § Use of docker –device command line option

Surfacing GPU device into a Container § Use of docker –device command line option § Use of vendor specific Docker wrapper such as nvidiadocker (Introduced in 2016) § Use of NVIDIA Container Runtime (Introduced mid 2018) - Support for container runtimes other than Docker

GPU Access from within a Container Source: http: //www. nvidia. com/object/docker-container. html

GPU Access from within a Container Source: http: //www. nvidia. com/object/docker-container. html

Spark § A unified analytics engine for large-scale data processing § Easy to use

Spark § A unified analytics engine for large-scale data processing § Easy to use - Java, python, scala, R § Runs everywhere - Hadoop, mesos, kubernetes, standalone

Tensor. Flow § Distributed solution for training models in parallel, on multiple devices, using

Tensor. Flow § Distributed solution for training models in parallel, on multiple devices, using GPUs § Goal is to improve accuracy and speed

Horovod § An open source framework developed by Uber that supports allreduce § Distributed

Horovod § An open source framework developed by Uber that supports allreduce § Distributed training framework for - Tensorflow, Py. Torch, Keras § Separates infrastructure capabilities from ML § Installs easily on existing ML framework - pip install horovod § Uses bandwidth optimal communication protocol - RDMA, Infini. Band if available

Spark, Tensor. Flow, NVIDIA, Horovod, on Docker Spark Driver Horovod cluster Spark Executor Horovod

Spark, Tensor. Flow, NVIDIA, Horovod, on Docker Spark Driver Horovod cluster Spark Executor Horovod cluster on multiple GPUs, containers, and machines NCCL 2. 3. 7 MPI 3. 1. 3 Tensor. Flow 1. 9 GPU / CUDA 9 Shared Data

Deep Learning and Docker Containers § ML/DL applications are compute hardware intensive as well

Deep Learning and Docker Containers § ML/DL applications are compute hardware intensive as well as complex to setup and administer § They can benefit from the flexibility, agility, and resource sharing attributes of containerization § But care must be taken in how this is done, especially in a large-scale distributed environment

AI-Driven Solutions for the Enterprise Example Industry Use Cases Solutions Fraud Detection Genome Research

AI-Driven Solutions for the Enterprise Example Industry Use Cases Solutions Fraud Detection Genome Research Customer 360 Video Surveillance Infrastructure Data Science and ML / DL Tools Data Platforms Data IT HDFS/NFS Data Store User Access Data Duplication Security Time to Deploy Cloud Multi-Tenant

Distributed Deep Learning on Docker Secure, multi-tenant, elastic environments on shared infrastructure Prototype Support

Distributed Deep Learning on Docker Secure, multi-tenant, elastic environments on shared infrastructure Prototype Support for Parallelism Quickly deploy sandbox nodes in minutes Model save, load, share, and run Compute Isolation • Submit jobs using notebooks, web UI, or API CUDA runtime cu. DNN Mount GPU devices and surface device drivers DIY / K 8 S / DCOS / Containerized Infrastructure for AI / ML / DL Data GPU and Isolation Non-GPU Hosts HDFS Data Lake or NFS Enterprise Storage Data Isolation • On-demand GPU-enabled clusters for deep learning • Web-based SSH access for CLI jobs • Shared storage access with security • Cloud-ready architecture, using Docker containers

Challenges Solved § Pre-built Docker images with CUDA and automated cluster creation for the

Challenges Solved § Pre-built Docker images with CUDA and automated cluster creation for the entire stack - Easy to deploy, repeatable § Appropriate NVIDIA kernel module surfaced automatically to the containers § Scalable access to resources (e. g. single node, single GPU, multi-node, multi-GPU combinations) § UI, CLI, and API use patterns (notebooks, web, SSH) § Fast access to shared data § Support of hybrid on premise and cloud infrastructure

Faster ML / DL Deployment Time Legacy Deployment Submit Job / Model End User

Faster ML / DL Deployment Time Legacy Deployment Submit Job / Model End User SSH / UI 45 Days ~10 Minutes Management Security (KDC, AD/LDAP) User Access (SSH, SSL) Software ~ x Days Add Services Load Balancing Port Mapping Add / Configure Libraries Onboard Users Init. d Configuration Hardware Submit Job / Model SSH / UI Networking Cluster Configuration Storage Security (KDC, AD/LDAP, SSL) Operating System Application Image Physical Server Docker ~10 Minutes Download / Install Deployment with Blue. Data

Lessons Learned § Enterprises are using ML / DL today to solve difficult problems

Lessons Learned § Enterprises are using ML / DL today to solve difficult problems - Example use cases: fraud detection, disease prediction, etc. § Distributed ML / DL in the enterprise requires a complex stack, with multiple different tools - Tensor. Flow is one popular option § Deployments are challenging, with many potential pitfalls - Containerization can deliver agility and cost saving benefits

Takeaways § Distributed deep learning applications can be deployed on Docker containers § GPU

Takeaways § Distributed deep learning applications can be deployed on Docker containers § GPU resources can be effectively shared between multiple applications § Deep learning requires a complex software / hardware stack § Dev. Ops complexity can be abstracted from the data scientist with self-service provisioning and automation § Data resources should be decoupled from compute resources in order to maximize platform flexibility and scalability

Recommendations § Start with a few key use cases to explore deep learning §

Recommendations § Start with a few key use cases to explore deep learning § Be prepared – the only constant is change - Business needs and use cases constantly evolve - Tools evolve in dog years § Leverage a flexible, scalable, and elastic platform for success - Containerization can deliver agility and cost saving benefits - But there are significant challenges … DIY will likely be DOA

Please Rate This Session page on conference website O’Reilly Events App

Please Rate This Session page on conference website O’Reilly Events App

Tom Phelan @tapbluedata

Tom Phelan @tapbluedata

Thank You For more information, visit www. bluedata. com

Thank You For more information, visit www. bluedata. com