From DBA to DPA Becoming a Data Platform
From DBA to DPA – Becoming a Data Platform Administrator Lior King Database Team Manager - Inter. Continental Exchange (ICE) Lior. king@gmail. com
The data world is changing Once Today RDBMS for everything RDBMS is the default but is not good for everything. RDBMS – the only option There also document based DBs, K/V databases, Analytics systems etc. Commercial RDBMS only Commercial RDBMS AND ALSO cheap community edition RDBMS DWH on RDBMS only DWH on RDBMS AND ALSO on special Analytics platforms Applicative DBA / Infrastructure DBA “Full stack” DBA
Data Platform Administrator - DPA • Todays’ data platforms consists of: 1. RDBMS – sometimes more than one. 2. No. SQL systems: Document based, column based, K/V based, graph based. 3. Analytical systems. • SQL language is as relevant as ever: • Most systems can “talk” SQL of some kind. • Todays’ BI systems can connect to ALL of them. • The DPA needs to administer them all.
How can you become a Data Platform Administrator ?
Here are 6 topics worth learning to become a data platform administrator
#1: Learn about Analytic Platforms • They can process real “big data” volumes very fast • starting from dozens of TB and up to many PB. • They are scalable (some of them have limits…). • They are the true “data warehouse” platforms for “big data” projects. • Today’s leading platform is Hadoop • The most scalable solution – limitless. • Rather cheap (in comparison to the alternatives). • Good for all analytics: data scans (M/R), seeks (Impala/Shark), Machine Learning (Mahout), workflows (Tez/Zoo. Keeper) etc. • Evolves very rapidly – the #1 project of the Apache Software Foundation. • Alternatives: PDW, Vertica, Green. Plum, Netezza, Tera. Data
#2: Learn about Document Based DBs • • Scale more than RDBMS Easy to manage and to program with (object insertion/retrieval) Flexible schema design. They are very fast – usually faster than RDBMS. • They also have MANY disadvantages compare to RDBMS. • 3 recommendations to check out: • Mongo. DB • Couch. Base • Azure Document. DB.
#3: Learn Linux – 10 reasons It’s free Evolves very fast – update release every 6 -9 months. Convenient software repositories. Modifiable and customizable. Runs on any platform. Very secure – a FW at the heart of the kernel Lack of malware (installing from the repositories). Restart after upgrades? Usually not required. Freedom to choose distribution and GUI. SQL Server on Linux – next summer.
#4: Learn one “free” RDBMS platform • Companies look for cutting costs. “Free” DBs are becoming popular. • There is a growing demand for DBAs who can manage them. • Which one to learn? • My. SQL – resembles SQL Server in many ways (check out Maria. DB as well). • The most popular “cheap” RDBMS. • Postgre. SQL – resembles Oracle in many ways. • Uses by Green. Plum, Netezza, Par. Accel (used in Amazon “Red. Shift”), Truviso. • Sometimes a cheap RDBMS platform can be “good enough”. • SQL Server and My. SQL together? Why not?
#5: Learn Python Easy to learn. Very readable. Cross platform. Good for scripts as well as for big development projects Object oriented Huge library of packages (~40 K packages in 300 topics - on Py. Pi site). A general purpose language. A leading platform for data analytics (almost as popular as R).
#6: Become a “full stack” DBA • Study infrastructure (production) as well as Database development. • Infra: • Study all HA/DR solutions of SQL Server: • Always. On/Mirroring, Log Shipping, Replication, Clusters. • Master the dynamic views. • Learn a scripting language – Power. Shell or Python. • Master the locking mechanism. • Development: • Master SQL and T-SQL. • Study the optimization engine and deep dive into execution plans. • DB Dev. Ops. • Learn an OO programming language (C#, Java, Scala, Python etc. ).
About Inter. Continental Exchange (ICE) • ICE owns 11 financials and commodities exchanges worldwide • Including the New York Stock Exchange (NYSE) - the biggest stock exchange in the world. • ICE owns and operates 6 clearing houses for derivatives. • ICE is a major global supplier of financial market data. • ICE is a fast growing organization – mostly by M&A (merges and acquisitions): Year Revenues (million USD) Net Income (million USD) 2005 155 53 2010 1, 150 398 2015 3, 338 1, 274
Data Platforms in ICE Transaction processing (RDBMS): • Oracle (RAC/Exadata), Times. Ten • MS SQL Server (2008 R 2/2014) • My. SQL • Postgre. SQL • Sybase • DB 2 (LUW) Analytics: • Green. Plum • Netezza • Hadoop • Cassandra
How do we manage it all? Learn more than one RDBMS. Learn one analytics platform at least. Learn Linux Learn Shell scripting and/or Python Share the knowledge. Studying is a part of the job – it NEVER stops. Play around in sandboxes.
Summary: Become a Data Platform Administrator Learn about Analytic Platforms (focus on Hadoop) Learn about Document Based DBs Learn Linux Learn one “free” RDBMS platform Learn Python Become a “full stack” DBA – Infrastructure AND development.
- Slides: 16