Technical Training Hadoop 101 November 2016 ISILON MOMENTUM
Technical Training Hadoop 101 November 2016
ISILON MOMENTUM • 7, 000+ Customers World Wide • 1, 300 New Customers in 2015 • #1 in Data Lakes >1, 200 HDFS Customers • >12, 000 Clusters in the Field • Approaching 1 PB Average Cluster Capacity Scale-Out NAS Leader 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
HADOOP MARKET LEADERSHIP #1 Market Leader in Hadoop Shared Storage 1200+ Customers Rapidly growing use case Pivotal Cloudera IBM
ISILON = ENTERPRISE OPEN SOURCE HADOOP Benefits • Enterprise Proven Security and Data Protection Deployment • Separate Compute and Storage • Isilon supports the HDFS interfaces for the Data. Node and Name. Node to host data and metadata • Cost Effective Scaling • Underlying file system is One. FS • Lower TCO—Management, Power, Cooling, Footprint • As simple as pointing the HDFS clients to the DNS name of the Isilon cluster! • Multi-protocol—Bring the Analytics to the Data • Reduced Time to Results with Consistent SLA’s
SUPPORT FOR STANDARD APACHE COMPONENTS • Isilon’s HDFS implementation supports standard Apache Hadoop RPC protocols and commands • Isilon is 100% compatible with Apache compliant Hadoop distributions • Components & Applications that run on Apache Hadoop run seamlessly on Isilon
ONEFS OPERATING SYSTEM Single File System One Namespace High Performance Unmatched Efficiency Simplicity & Ease of Use Linear Scalability Easy Growth 6
ENTERPRISE GRADE SOFTWARE DATA PROTECTION & EFFICIENCY DATA MANAGEMENT Snapshot. IQ Smart. Pools Fast, Efficient Data Backup And Recovery Policy-based Automated Tiering Sync. IQ Smart. Quotas Fast And Flexible Asynchronous Replication For Disaster Recovery Protection Smart. Connect Quota Management And Thin Provisioning Insight. IQ Policy-based Client Failover With Load Balancing Performance Monitoring And Reporting To Manage Storage Resources Smart. Lock Cloud. Pools Policy-based Compliance and WORM Data Protection Smart. Dedupe Data Deduplication to reduce storage requirements and costs Cloud-scale Capacity
HDFS COMPLETELY REWRITTEN IN ONEFS 8. 0 H D F S • HDFS protocol rewritten in C++ – Increased parallel processing – Greater scalability – Support for audit, Cloud. Pools, and SMB file filtering • New web administration interface – Full configuration options in web administration interface • Improved CLI options – isi hdfs command controls HDFS settings
ONEFS 8. 0 HDFS LOAD BALANCING (1) Smart. Connect DNS Policy Load Balancing HDFS Namenode Traffic DFSClient HDFS Datanode Traffic Hadoop Compute Node (4) Write HDFS Blocks (3) Read HDFS Blocks (possible Adaptive Prefetch) (2) Namenode Session: • add. Block (Write) • get. Block. Locations (Read) Smart Connect Svc. IP Namenode Datanode Isilon Node Single Isilon Cluster (Infiniband Backend) Virtual Rack Namenode Datanode Isilon Node
ONEFS 8. 0 DATANODE LOAD BALANCING INTELLIGENTLY IMPROVE YOUR HADOOP PERFORMANCE Key Features HDFS Client Intelligently provides datanode with the least load to new HDFS clients Totally transparent to client, no configuration required 1. Namenode: Where to write? 2. Write to Node 2. Node 1 3. Good, will write to Node 2. Improves overall performance of Hadoop clients for analytics workloads Node 2 Node 3 Benefits Avoids overloading any specific One. FS node and increases cluster resilience Connection Count
ONEFS 8 HDFS WEB INTERFACE New HDFS configuration page in web administration interface (One. FS 8) Can enable HDFS and change block size Authentication type and root directory: Any configuration previously done via CLI now done in web administration interface
ONEFS ACCESS ZONES • An access zone is: – A way to carve the cluster into smaller clusters – A way to control access based on individual authentication – One. FS’s Multi-Tenancy solution Domain Controller-1 Kerberos-1 Group Database - 2 Group Database - 1 Kerberos-2 Domain Controller-2 LDAP-1 NIS - 1 Access Zone-1 System Zone Access Zone-2
DNS PER ACCESS ZONE • Domain Name Resolution per Access Zone – Best fit in environment where each tenancy (group) has dedicated domainname directory service – Foundation piece to isolate client network connectivity associated with directory services. legal. bb. com DNS 1 168. 152. 3. 7 DNS 0 Domain Controller-1 Kerberos-1 LDAP-1 Local Database-1 Access Zone-1 System Zone hr. bb. com DNS 2 168. 152. 4. 12 Domain Controller-2 NIS-1 Access Zone-2
KERBEROS SECURITY ENHANCEMENTS • One. FS 8. 0 enhanced Kerberos encryption support – Add AES 256 library › Enables AES 128 -bit and AES 256 -bit encryption support › Previous releases supported RC 4 and DES encryption only – Enabled by default › No setup required to enable support – Meets customer security requirements and expectations
ISILON ENCRYPTED CLUSTERS • Simplified Encryption using SED – Transparent, Always On, Everything Encrypted – No mixing of SED and non-SED drives or nodes • Value – No costly external equipment in addition to the storage you’re already deploying • Maximize Performance – Encryption workload is distributed to each drive – Less than 1% impact compared to non-SED • Internal Key Management – Highly available, internal key management
SED AVAILABILITY MATRIX 900 GB SAS 3 TB SATA 4 TB SATA 6 TB SATA* 800 GB SSD 1. 6 TB SSD Y 2 H 2016 S 200 S 210 Y X 400 Y Y Y X 410 Y Y Y X 200 Y X 210 Y Y Y NL 400 Y Y Y NL 410 Y Y HD 400 * One. FS 8. 0 and later only 2 H 2016 Y Y* 2 H 2016* Y 2 H 2016 Y* 2 H 2016* Y (with 6 TB) 2 H 2016 (with 8 TB)
HDFS AMBARI INTEGRATION HADOOP MANAGEMENT MADE SIMPLE Key Features Leverage Ambari to monitor key performance metrics and alerts of One. FS file system Key metrics like disk, CPU, network and namenode usage are all reported Benefits Single management point for Amabri operator to manage a Hadoop cluster with One. FS Ambari admin can proactively monitor and troubleshoot performance issues with One. FS file system similar to DAS
RANGER INTEGRATION AUTHORIZE ONLY THE RIGHT USERS TO ACCESS HDFS FILES Key Features Ranger Enables Ranger authorization policies to be executed in One. FS native file access control continue to be effective Dual access control checks guarantee file access meet both Hadoop and IT admin needs Ranger Authorization Policies Benefits Ranger admin can enforce Hadoop access policies across all Hadoop components consistently One. FS Native Access Control Check IT/datacenter admin maintains their control on multi-protocol datalake access in One. FS
AVAILABLENOW • Isilon Hadoop Tools – Automate user/group creation for Hortonworks, Cloudera, Pivotal. HD, and IBM Big Insights – Creates directory structures according to Hadoop distribution github. com/isilon_hadoop_tools
ISILON FOR MULTIPLE ANALYTICS APPLICATIONS NFS SMB name node data node NFS SMB, NFS, HTTP, FTP, HDFS MAP Reduce MAP Reduce
ISILON SCALE-OUT ARCHITECTURE Web Windows Apps Mac/i. OS Cloud Linux/Unix Hadoop Archive 10 Gig-e and IB (Today) 10 and 40 Gig-e (Gen 6 HW 1 H 17)
ISILON’S STRATEGY FOR DATA INSIGHTS Actionable Understanding • • Helping users understand their Isilon Environment Presenting the right information in a meaningfully, accessibly and understandably Improved storage utilization • • Capacity planning and forecast Track who’s using the most resources Personalized dynamic information • Enabling development of tailored Data Insights
THE DATA INSIGHTS APPLICATION • Powerful cluster monitoring & reporting on – Flexible reporting on file system characteristics › View “top N” and drill into directory & file sizes, ages, access time, etc. – Track health and performance of the cluster • Capacity forecasting – Estimate capacity utilization based on selected time frames – Plan for cluster expansion or file system cleanup activities • Live performance statistics for cluster, node, protocol, client, and more – View and drill into throughput, operations, CPU utilization, etc. to plan optimization efforts
INSIGHTIQ 4. 0 Key Features Supports One. FS 8. 0 and Isilon. SD Edge clusters New, easy to use capacity utilization forecast tool Improved performance and reliability to scale with data growth Added support for IPv 6 reporting Explore fresher and more reliable File System Analyze data from One. FS
CAPACITY FORECAST TOOL Select the Range on which to base the forecast Dotted line shows predicted utilization Filter by tier or nodepool for even more meaningful forecasting Forecast date highlighted along with growth rate
CLOUDPOOLS CORE CLOUD PROVIDER HOT DATA >30 days WARM DATA 1 -2 Months FROZEN DATA 1 -2 years 26
CLOUDPOOLS CORE CLOUD PROVIDER SEAMLESS CLOUD INTEGRATION APPS & USERS Access time 27
CLOUDPOOLS ENABLES FLEXIBLE CHOICE Private PRIVATE DATA LAKE HOSTED PUBLIC Hosted Public Great for Larger Datasets Smaller Datasets, Bursting Cost Lowest overall TCO Predictable Opex Variable Opex Latency Low Med-High Data Residency Concerns None Low Med-High Compliance Control High Low-Med Low
One. FS v 8. 0. 1 CLOUDPOOLS – PROXY SUPPORT Key Features EMC 2 Multiple Isilon nodes can now simultaneously tier to the cloud Isilon ECS Ability to update proxy servers Benefits No direct external network exposure of Isilon systems for Cloud. Pools No networkarounds necessary to configure Cloud. Pools Proxy Internet
DELL EMC ISILON FAMILY Linear Scaling of Performance and Capacity Performance S-Series Reduced Costs X-Series High Performance Platform Highly Versatile Platform NL-Series HD-Series Isilon. SD Nearline Platform Software Defined Your hardware High Density Platform Cloud. Pools Internal Cloud External Cloud Capacity
DELL EMC ISILON NITRO ALL FLASH, SCALE OUT NAS • Isilon’s highly dense, extremely modular and incredibly scalable all-flash tier – Targeted at extreme performance NAS markets • Start a cluster with a single chassis & scale to 100+ chassis is a single global namespace • Get up to 1 PB of Flash from 1 4 U chassis • All One. FS Features Supported!
100+ Petabytes 400+ NODES! Integrates into your existing cluster #1 DIFFERENTIATOR Single namespace, from flash to spinning drives to cloud Enterprise Features: Smart. Pools Cloud. Pools Smart. Connect Smart. Lock Smart. Quotas Sync. IQ Snapshot. IQ Insight. IQ
NITRO USE CASES HIGH THROUGHPUT: large datasets of large files for parallel processing ü Lossless high quality media output ü Quickly finding outliers and variations – DNA sequencing ü Pattern and trends search - Weather data Media & Entertainment Electronic Design Automation IOps INTENSIVE: PREDICTABLE LATENCY: Billions of Small file, large datasets for parallel processing Predictable performance for mixed workloads ü Content repository ü Compute intensive ü Big data analytics ü Enterprise applications ü Multimedia & content delivery Life Sciences Geoseismic Io. T IMPROVED TCO: Relief from infrastructure and energy efficiency ü Operational cost confined ü Energy efficiency mandates ü Infrastructure constrained Government High Performance Computing
- Slides: 34