Intel and Cloudera Accelerating Enterprise Big Data Success
Intel and Cloudera Accelerating Enterprise Big Data Success Aaron Davies-Morris Managing Director Intel Big Data Professional Services 1 INTEL CONFIDENTIAL
Big Opportunity: Extract value from data Revenue Growth 50 Billion x 35 ZB = Cost Savings Margin Gain THINGS DATA VALUE
Big Gap: Roadblocks on the journey Worry about attacks NO 50 Billion SECURITY Revenue Waste time on misguided Growth pilots Bring data to compute -- fail to scale x NO 35 ZB INSIGHT = Delay insights with batch Hold back production processing deployment Pay more for data Store underutilized management data Use sub-optimal hardware THINGS DATA NO Cost Savings PROOF Margin Fail to show Gain ROI VALUE
Another: Datacenter Inflection 3 4 Cluster to Cloud ASIC to IA/Fabric Private 2010 2011 2012 2013 2 Physical to Virtual SW-only to HW-assisted Virtualized Nonvirtualized 2008 2009 2010 2011 2012 2013 1 Big Data Public UNIX to Linux RISC to IA “In 2000 Intel saw Linux coming & invested heavily in Red Hat; in 2005 we saw virtualization happening and invested in VMware; in 2008 we started investing heavily in hyper-scale computing. We think big data & Hadoop will dwarf all of them. ” Linux/x 86 Units UNIX/RISC units 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Diane Bryant, SVP & GM Data Center Group, Intel
History: Intel and Apache Hadoop* Optimization Product Tuning Benchmarking Release IDH 2. 0 Release IDP 3. 1 Research Telco Smart City (2012) (2014) Release IDH 1. 0 Hi. Bench Open Cirrus* Healthcare Retail (2011) Web 2009 2014 * Other names and brands may be claimed as the property of others.
Big Investment: Big Deal Accelerate innovation via open source software • Maintain an open horizontal platform for big data • Continue to enhance Apache Hadoop and related projects Enable Hadoop to run best on IA • Optimize performance across compute, storage, & network • Ensure platform security, enhanced by hardware Foster evolution of big data ecosystem • Establish usage models and industry standard benchmarks • Develop reference architectures and industry-wide solutions
Ensure: Cloudera runs best on Intel Architecture Software & Silicon co-evolve to deliver dramatic gains 1 Push computeintensive work down to the silicon Encryption (AES-NI) Compression (SSE 4. 2) Math (MKL) 2 Increase main memory utilization up to 20 X Improve Disk : Memory 200: 1 10: 1 3 Design for rackscale architecture
A Foundation For Delivering “Big Answers” >4 hours Intel® Xeon 5690 7200 HDD 1 Gb. E Adapter Shown To Improve 1 Terabyte Sort From 4 Hours To 7 Minutes Intel® Xeon® E 5 -2690 v 3 Processor ~50% Improved Intel® SSD Series ~80% Improved Intel® 10 Gb. E Adapters and Switch ~50% Improved Based on internal Intel test results Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and Mobile. Mark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Internal testing For more information go to : intel. com/performance ` Other brands and names are the property of their respective owners 8 Intel / Cloudera Apache Hadoop* Software ~40% Improved ~7 minutes
Solving problems that matter Personalized Healthcare With Michael J Fox Foundation, enabled research scientists to monitor progression of Parkinson’s Disease by developing analytics platform for data collected from wearable sensors. Smart Building Enabled Daikin Applied to lower energy costs and predict maintenance schedule by analyzing data from HVAC rooftop units in the cloud and managing systems remotely. Responsive Store With Living Naturally, helped natural food retailers to optimize inventory in realtime by developing social media analytics solution on big data platform. 9
Intel delivers Trust and Intelligence for the Internet of Things Connect, Secure, Manage, and Analyze data from billions of devices Services (APIs) Things Gateways Network With Intelligent Solutions based on Trusted Platforms Datacenter | Cloud
Evolve Big Data Analytics for the Internet of Things Era Build lighthouse solutions to demonstrate value and reference architectures to scale Solutions Enable mainstream use of advanced analytics at volume economics Analytics Accelerate innovation in and adoption of Apache Hadoop as an open trusted platform Data Platform Compute Storage Network Catalyze industry growth by supporting Software Defined Infrastructure with open, standard building blocks 11
Intel Platforms Raise Security Posture Intel® Data Protection Technology E 5 -2600 v 3 Intel® Platform Protection Technology SW Optimization/Enablement E 5 -2600 v 2 Bold texts are platform interception E 5 -2600 X 5500 AES-NI Intel TXT XD bit XD Bit Buffer overflow Protection & SW based crypto AES acceleration Platform root of trust Secure Key AES-NI OS Guard (SMEP) Intel TXT XD bit Trust Attestation T-Boot Function Stitching and Multi-buffer Trust Attestation Intel Confidential Platform entropy Privilege escalation attack prevention General Crypto Assists Asymmetric Crypto Assists Symmetric Crypto Assists PCH-DRNG Secure Key AES-NI TPM 2. 0 Support BIOS Guard (WS) OS Guard (SMEP) Intel TXT XD bit End-to-end HW crypto acceleration
Complete End-to-End H/W Assisted Data Protection Secure Connections Key Generations Data Encryption New Crypto Assists Secure Key AES-NI More Concurrent Users/Connections Enhanced Security Fast Response Time Authentication Haswell New Crypto Assists High Performance Intel® Xeon® E 5 -2600 v 3 Enhances the Full Spectrum of Data Protection AVX 2/BMI RDRAND Enhanced AES-NI AVX 2/BMI No computer system can be absolutely secure. Requires an enabled processor, chipset, firmware and software. Check with your manufacturer or retailer for more information. Intel Confidential 13
Focus: Joint Engineering Feature / Target Cloudera Enterprise • • SECURITY • HDFS Encryption and extended file ACLs Centralized authorization via Sentry Simplified Kerberos • • • PERFORMANCE • • • MANAGEMENT • Crypto acceleration with AES-NI MR/Shuffle optimizations Compression acceleration with SSE 4. 2 • Service management extensions Simplified cloud provisioning, including AWS support Backup and Disaster Recovery • • APPLICATIONS • • Certified w/ Intel Enterprise Edition of Lustre Impala enhancements including low-latency SQL engine, SQL-92 analytic queries, and more Spark support in CDH, including Spark on YARN, Spark security, and Spark streaming SQL on HBase Intel Confidential • • HBase cell-level authorization Search: document and index security Auditing & data lineage Optimizations using AVX and other IA Optimizations using MKL Explore Xeon Phi with Java support Deeper diagnostics of various modules Support for Azure, VMware, Open. Stack Extended RBAC in Cloudera Manager Spark interoperability with Impala Wire encryption for Spark Pig integration with Spark/Sentry integration
Intel Professional Services: Deep Expertise A global team of experienced professionals Deep Hadoop, BI, and security-domain experience Decades of deployment expertise Developers of best-practice deployment methodologies and tools Providers of advanced integration and operational assistance Our team’s experiences working at Facebook, HP, IBM, Federal Agencies, PWC, Nokia, Mc. Afee, and Sears delivers you the knowledge you need. Intel Confidential
Security Use cases 16 INTEL CONFIDENTIAL
Use Case Claims Analytics Company Global Property and Causality Insurer Industry Financial Services - Insurance Description Holistic fraud detection and prevention system by combining claims transaction structured data with text contained in multiple sources from claim systems to the wide variety of notes and documents associated with an insurance claim. Cluster size in pilot 10 Cluster size in production 20 Hadoop distribution used IDH/CDH Intel Confidential
Claims Analytics Business Need Benefits to Business Holistic fraud detection and prevention system by combining claims transaction structured data with text contained in multiple sources from claim systems to the wide variety of notes and documents associated with an insurance claim. Built text mining algorithms to detect fraudulent claimant behavior by analyzing adjuster notes, claim diaries and other textual sources. • Real time access to unstructured data for interactive searches • Patterns , relationship to identify fraud • Help Adjuster identify claims for potential fraud more efficiently Solution • 24 hours limited process functionality reduced to minutes • Identify ultimate Counter Party and evaluate fraud and financial crime • Leverage SEC filings and all of the unstructured data associated with counter parties • Feed that information back into the Risk Platform 18 INTEL CONFIDENTIAL
Use Case Cyber security & Situational Awareness Company Various Industry IT Security - Generic Description Today’s enterprise systems and networks that operate in cyberspace have vulnerabilities that present significant risks to both individual organizations and national security. By anticipating what might happen to these systems, companies can develop effective countermeasures to protect their critical assets. This use case shows how Big data can help to build Cyber security & Situational Awareness system. Cluster size in pilot Various Cluster size in production Various Hadoop distribution used Various Intel Confidential
Cyber Security & Situational Awareness Business Need Develop Cyber security & Situational Awareness system to understanding company's networks. Accurately predict and respond to potential problems that might occur. This system should be able to collect data various internal and external systems, manage structured and unstructured data, and analyse it to maintain an ongoing picture of how the computer systems, networks, and users are operating in an organization. Solution 20 INTEL CONFIDENTIAL Benefits to Business • Effective command control and full awareness of what's happening in company network. With this awareness, negative situations can be recognized and managed as they occur. • Achieve cyber situation awareness at the operational level by summarizing lower level details and put into the perspective, thus exposing the real impact to company operations. • Able to perform correlation and other analytics across a “larger set of information”. • Agile operations : ability to change reshape cyber system goals to avoid attacks.
- Slides: 21