Big Data Directions Greg Battas Big Data Chief
Big Data Directions Greg Battas Big Data Chief Technologist © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice.
Big Data is Growing Up Several shifts beginning in Big Data Architecture Big Data cluster consolidation Software defined storage taking root Software organizing around a common base Purpose built hardware for Big Data 2 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice.
Comparing Big Data and CI architectures Converged Infrastructure Ethernet Switches Argos Blade Blade • Ethernet designed for Flexibility • Blades allow dense compute nodes • Storage arrays shared by SAN designed to be accessible to any node so that it can be dynamically allocated Big Data Argos SAN Switches Argos Shared Storage 3 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice. • Network designed for low cost/high cross sectional bandwidth • Argos allows maximum density with mediocre CPU power • Direct attached storage with minimal hardware resiliancy is used for cost and cultural reasons
Big Data Architecture Principals and Pitfalls Principals • Began with a movement away from proprietary storage and databases • Parallel Programming/Distributed Filesystem’s on industry standard hardware • “Move” compute closer to the data/disk to reduce overhead • Direct Attached Storage with S/W resiliency • Strong Open Source culture • Major Ecosystems with Rapidly evolving, mix and match functionality Pitfalls • Provisioning servers means moving data • • Difficult to quickly “re-slice” a configuration No simple sharing of data amongst clusters • Big Data must be copied to each cluster to leverage various H/W and S/W Hadoop Batch Processing Hbase Event Processing Vertica Analytics SAS VA 6 am – 12 am Node Node © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice. Node Node Node Node Node 4 12 am – 6 am
“Take the Processing to the Data” Common Wisdom Traditional IT Ethernet Switches Big Data Ethernet Switches App App • Unlike other apps, big data depends on massive IO to read huge amounts of data from disk • Traditional SAN approaches where every block must be shipped over a SAN does not scale cost effectively • Big data scales because the processing happens close to the data by using internal DAS and shipping work to each node App SAN Switches App Shared Storage 5 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice.
“Take the Processing to the Data” Root Reality Only a portion of the processing can be done locally Map Map • Shuffles redistribute data across the grid • Replication pushes inserts and updates to multiple nodes Op Op Op Op Reduce MPP RDBMS’s have spent years optimizing this problem • Learned that operations should be pushed down if they are • Data reducing • Have complete locality of data • Learned that the majority of the CPU power is still needed for work that can’t be pushed down 6 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice. Map Map Op Op Op Op Reduce Map Map Op Op Op Op DFS DFS
Software Defined Storage Big Data is often deployed with distributed file systems on industry standard hardware A different approach • The largest data stores in the world chose to move to industry standard servers running parallel file systems rather than traditional storage arrays or databases HDFS, S 3, Swift and Cinder are becoming most significant as interfaces Today a mix of proprietary and open source technologies Big data is accelerated the adoption of SDS into other areas 7 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice.
Big Data Software vendors HDFS becoming the common substrate for many Hadoop Map. Reduce MPP DBMS Data Integration Analytic Tools &Frameworks HDFS 8 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice. Enterprise Security Unstructured Analytics
No. SQL products being adopted by software vendors Fueling a shift to open source The first wave of Big Data was around Batch • Hadoop for Analytics and ETL offload • Often coupled with interactive SQL co-processors Now we are seeing growing interest in No. SQL products • Commercial ISV’s are the canary in the coal mine • Some very aggressive projects to port to No. SQL • Hbase seems to be preferred by ISV’s Challenge of moving commercial products to No. SQL • SQL Language • Transactions • Joins 9 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice.
The Shift to Optimized hardware System on a Chip creates a new model for servers The significance of Moonshot goes far beyond packaging The power of purpose built hardware The Economics of Dark Silicon Acceleration Open source opens the door 10 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice.
Where we are working in Big Data Allow customers to converge big data clusters • Leverage shared resources for multiple big data environments • Allow rapid elasticity and provisioning without moving data • Ability to store data once and operate on it with different types of compute nodes Bring big data software together into a common framework • Hadoop, Unstructured analytics, MPP DBMS, Enterprise Security, analytic tools and data integration tools • Aligned around a common distributed filesystem (HFDS compliant) • Support multi-temperate data Assist ISV’s and customers moving to No. SQL • Leverage HP Intellectual property in database Use Moonshot to leverage the shift to optimized hardware 11 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice.
12 © Copyright 2013 Hewlett-Packard Development Company, L. P. The information contained herein is subject to change without notice.
- Slides: 12