Case Study Georgia Tech University Private Cloud for

  • Slides: 41
Download presentation
Case Study: Georgia Tech University Private Cloud for Researchers Didier Contis Director Technology Services

Case Study: Georgia Tech University Private Cloud for Researchers Didier Contis Director Technology Services College of Engineering Georgia Institute of Technology Joe Arnold CEO Swift. Stack Inc.

Session Speakers Didier Contis is the Director Technology Services / College of Engineering, Georgia

Session Speakers Didier Contis is the Director Technology Services / College of Engineering, Georgia Institute of Technology Greater Atlanta Area. The largest of Georgia Tech’s six colleges, Co. E offers more than 50 graduate and undergraduate degree programs through its main Atlanta campus and satellites around the world. Its 13, 000 students use an estimated 150 unique apps—the same ones businesses rely on to design airplane wings, model circuit-board layouts, and much more. 2 Georgia Tech Case Study - Open. Stack Summit 2014 Joe Arnold is co-founder and CEO of Swift. Stack, a leading provider of object storage software. Swift. Stack's customers include some of the largest web and enterprise IT organizations. Joe managed the first public Open. Stack launch of Swift after its release as an open source project. He has been active in the Open. Stack community since 2010. Joe is the author of Object Storage with Swift published by O'Reilly Media.

Object Storage and Open. Stack Swift Joe Arnold CEO Swift. Stack Inc. 3 Georgia

Object Storage and Open. Stack Swift Joe Arnold CEO Swift. Stack Inc. 3 Georgia Tech Case Study - Open. Stack Summit 2014

Swift Object Storage – Key Attributes • Open-source object storage system • Powers the

Swift Object Storage – Key Attributes • Open-source object storage system • Powers the largest storage clouds • Geographically distributed “Open. Stack Swift in particular has gained a lot of traction both in the enterprise and in the service provider space” October 2013 • Multi-tenant • Massively concurrent • Extremely durable • Runs on standard Linux • Inexpensive commodity x 86 hardware 4 Georgia Tech Case Study - Open. Stack Summit 2014 “Swift is a proven solution, suitable for production needs, and should be included in competitive evaluations of object-based storage solutions. ” February 2014

Swift Data Redundancy Swift places 3+ replicas of all data as unique as possible

Swift Data Redundancy Swift places 3+ replicas of all data as unique as possible Single Node Cluster Disks are “as-unique-as-possible” Small Cluster Storage Nodes are “as-unique-as-possible” 5 Georgia Tech Case Study - Open. Stack Summit 2014 Large Cluster Storage Racks are “as-unique-as-possible” Muti-Region Distributed data centers are “as-unique-as-possible”

Swift Object Storage – Filesystem Conceptual View Files CIFS Swift Node 6 Georgia Tech

Swift Object Storage – Filesystem Conceptual View Files CIFS Swift Node 6 Georgia Tech Case Study - Open. Stack Summit 2014 Swift. Stack Filesystem Gateway Swift Node NFS Swift Node

Georgia Tech Case Study Didier Contis Director Technology Services College of Engineering 7

Georgia Tech Case Study Didier Contis Director Technology Services College of Engineering 7

Who we are and what we do • Georgia Tech: 21, 471 undergraduate and

Who we are and what we do • Georgia Tech: 21, 471 undergraduate and graduate students (Fall 2013) • Six colleges: Architecture, Business, Computing, Engineering, Liberal Arts and Sciences • 6 th Top Engineering Graduate Programs • 5 th Top Engineering Undergraduate Programs • 13, 000 students in the College of Engineering, largest in the U. S. 8 Georgia Tech Case Study - Open. Stack Summit 2014

We have been deploying pre-cloud systems since 2007… Meet our federated condominium systems 9

We have been deploying pre-cloud systems since 2007… Meet our federated condominium systems 9 Georgia Tech Case Study - Open. Stack Summit 2014

Our VDI / App publishing farm… Virtual Lab Project and its supporting shared infrastructure

Our VDI / App publishing farm… Virtual Lab Project and its supporting shared infrastructure (Matrix) 10 Georgia Tech Case Study - Open. Stack Summit 2014

Our HPC farm… PACE: Partnership for an Advanced Computing Environment HPC federation / condominium

Our HPC farm… PACE: Partnership for an Advanced Computing Environment HPC federation / condominium system. 28, 000 cpu cores and 2 PB storage 11 Georgia Tech Case Study - Open. Stack Summit 2014

What we learned: 1) Our HPC and VDI users love compute power 2) They

What we learned: 1) Our HPC and VDI users love compute power 2) They love their research data even more They are not alone…. 12 Georgia Tech Case Study - Open. Stack Summit 2014

All our researchers / students love their research data They love to • Acquire

All our researchers / students love their research data They love to • Acquire • Create • Exchange • Receive Data…. 13 Georgia Tech Case Study - Open. Stack Summit 2014

Here is a research project generating a lot of data Remote Sensing and GIS-enabled

Here is a research project generating a lot of data Remote Sensing and GIS-enabled Asset Management System (RS-GAMS) Assessment of pavement, bridge, and roadway assets using various sensors Estimated Storage needs: 2, 400 lane miles interstate highways currently on files with plan to analyze 2, 000 miles in next few months… Raw data: 2. 2 GB per lane mile Processed data: 1. 2 GB per lane mile 16 Million jpeg files so far !!!! 14 Georgia Tech Case Study - Open. Stack Summit 2014

Here is a research project receiving a lot of data Effective Capacity Analysis and

Here is a research project receiving a lot of data Effective Capacity Analysis and Traffic Data Collection for the I-85 HOV to HOT Conversion The effectiveness of the implementation of the HOT lane is being evaluated in a before and after study. Direct fiber network feed from Georgia Department of Transportation to Georgia Tech Ø Over 400 TB of videos currently stored on random fileservers, USB drives… Ø Lots more video to collect…. 15 Georgia Tech Case Study - Open. Stack Summit 2014

Oh, by the way have you heard about: Research Data Curation The White House

Oh, by the way have you heard about: Research Data Curation The White House Office of Science Technology Policy: “has directed Federal agencies with more than $100 M in R&D expenditures to develop plans to make the published results of federally funded research freely available to the public within one year of publication and requiring researchers to better account for and manage the digital data resulting from federally funded scientific research. ” http: //www. whitehouse. gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research http: //www. whitehouse. gov/sites/default/files/microsites/ostp_public_access_memo_2013. pdf 16 Georgia Tech Case Study - Open. Stack Summit 2014

So where do we store all this data? 17 Georgia Tech Case Study -

So where do we store all this data? 17 Georgia Tech Case Study - Open. Stack Summit 2014

Research Data Storage Challenges Our Challenges: Ø Obviously we have a lot of research

Research Data Storage Challenges Our Challenges: Ø Obviously we have a lot of research data. How much ? ? ? (2 PB just for HPC) Ø Cheap enterprise level storage is still expensive Ø Backup is a problem (cost, time) Meet our BIGGEST challenges: Ø Bring-your-own-drive – USB, thumb Ø Consumer cloud – Dropbox, etc. 18 Georgia Tech Case Study - Open. Stack Summit 2014 on e g a r Sto nd a dem

Research Data Storage Challenges “Sometimes” important research data might be stored might on not

Research Data Storage Challenges “Sometimes” important research data might be stored might on not so reliable solution: • Due to cost of existing “enterprise storage” and research programs funding • Backup? Could you repeat the question please? 19 Georgia Tech Case Study - Open. Stack Summit 2014 !!! WARNING UNCONFIRMED REPORT !!! • • • Ultimate cheap NFS File Server circa 2006 Refurbished Desktop tower Two 5 ports USB cards 13+ USB drives each shared individually via NFS

Our magic answer to all our problems ? ? ? VAPOR the hybrid cloud

Our magic answer to all our problems ? ? ? VAPOR the hybrid cloud 20 Georgia Tech Case Study - Open. Stack Summit 2014

Meet VAPOR Goal: Build a Georgia Tech Distributed and Federated Academic Cloud Proposed design

Meet VAPOR Goal: Build a Georgia Tech Distributed and Federated Academic Cloud Proposed design principles: • Led by Academic Units in partnership with Central IT (Currently College of Engineering, College of Science, College of Computing, Library, HPC PACE Group, Office of Information Technology) • • • Support Instruction and Research at Georgia Tech Distributed across campus and beyond (Hybrid) Federate multiple departmental projects Design / Architecture by Committee Academic Governance Oversight Need to be able to experiment and iterate quickly !!!! 21 Georgia Tech Case Study - Open. Stack Summit 2014

Proposed Use Cases for VAPOR Cloud 1. Ephemeral computing: A machine runs for short

Proposed Use Cases for VAPOR Cloud 1. Ephemeral computing: A machine runs for short term use. Possibly for development/testing purposes. 2. "Pet" computer: student needs a system which is "permanent", stateful and accessible both off and on campus. Basic usage is like VDI. 3. Iaa. S: Running campus services (both production and beta) on VMs. E. g. I don't want to manage a hardware layer but I need to set up a purpose built website to host data and services for an international research group and webhosting doesn't meet my needs. 4. Paa. S: Running a platform. E. g. I don't want to manage hardware or the OS layer, but please give me a database I can use for this application. 22 Georgia Tech Case Study - Open. Stack Summit 2014

<DRAFT> Vapor Architecture Vision </DRAFT> Self-Service (to be defined / under investigation) Management (Microsoft

<DRAFT> Vapor Architecture Vision </DRAFT> Self-Service (to be defined / under investigation) Management (Microsoft Azure Pack / Redhat Cloud. Form…. . ) On-premise Component Off-premise Component HYPERVISOR or CONTAINER PODs (Hyper-V / KVM / Xen. Server + NVIDIA vgpu…) COMPUTE STORAGE (Gluster / Ceph / Scale-IO /. . . ) Amazon AWS Microsoft Azure Rack. Space DATA STORAGE (Swiftstack / DDN WOS / Gluster…) NETWORK (VXVLAN / NVGRE / VPN…) 23 Georgia Tech Case Study - Open. Stack Summit 2014

Today we are focusing on…. Data Storage 24 Georgia Tech Case Study - Open.

Today we are focusing on…. Data Storage 24 Georgia Tech Case Study - Open. Stack Summit 2014

Vision of the Data Storage layer • Will hold a large portion of GT

Vision of the Data Storage layer • Will hold a large portion of GT “Research Data” • Probably multiple data storage layers (multiple vendors / technology) • Some of our current requirements: Ø Distributed and Resilient (support multiple catastrophic failures) Ø Limit vendor dependency / lock-in (priority to open source) Ø Leverage de-facto standards (S 3 / Swift) Ø Support multiple entry points (API, Cloud NAS, pluggable services) Ø Flexible design to limit the need to migrate data to new systems down the road Ø Integration with Georgia Tech identity management system (LDAP & AD) 25 Georgia Tech Case Study - Open. Stack Summit 2014

Services supported by the Data Storage Layer / Swift Research Data Storage Research Data

Services supported by the Data Storage Layer / Swift Research Data Storage Research Data Curation Research Data Repositories DATA STORAGE (Swift. Stack – Storage as a service) “Dropbox” type service 26 Georgia Tech Case Study - Open. Stack Summit 2014 Filesystem Gateway (CIFS / NFS / GPFS/ …. )

Why Swift / Swift. Stack for the Data Storage Layer? Like: • • •

Why Swift / Swift. Stack for the Data Storage Layer? Like: • • • Swift is open-source (limit vendor lock-in in our mind) Turn key approach / manageability provided by Swift. Stack Growing ecosystem around Swift Low hardware requirement / homogeneous hardware not required System seem robust -> replication rather than RAID technology Price is right !!! (so far…. ) Don’t Like: • It is object storage / not native filesystem • Still young project / product 27 Georgia Tech Case Study - Open. Stack Summit 2014

Research Project Candidates to use Swift Projects in Aerospace, Transportations and Bio. Engineering currently

Research Project Candidates to use Swift Projects in Aerospace, Transportations and Bio. Engineering currently targeted Examples of research projects looking / experimenting with Swift: • Effective Capacity Analysis and Traffic Data Collection for the I-85 HOV to HOT Conversion • Remote Sensing and GIS-enabled Asset Management System (RS-GAMS) 28 Georgia Tech Case Study - Open. Stack Summit 2014

Our current strategy to engage research groups Goal: Incentivize researchers to store data directly

Our current strategy to engage research groups Goal: Incentivize researchers to store data directly Files into Swift as objects when it makes sense • This means demonstrating advantages from: • • • Apps Scripts Swift. Stack CIFS/NFS Gateway Indexing Metadata Scalability Performance Swift HTTP APIs Future benefits (analytics) Swift Node • It also means making an up-front investment: • • In training and technical assistance Providing free storage 29 Georgia Tech Case Study - Open. Stack Summit 2014 Swift Node

Filesystem Gateway • Using Object Storage natively is difficult • Lots of workflow based

Filesystem Gateway • Using Object Storage natively is difficult • Lots of workflow based on using files. Our students are using Windows / Linux applications packages which are not object friendly. Directory • Latency / speed is also an issue • Strategies being deployed or investigated: • Swift. Stack gateway with lots of cache • Would like a GPFS Gateway (High Performance Computing) • Storage abstraction technology (Software defined storage utopia…. EMC Vi. PR Data Services ? ) 30 Georgia Tech Case Study - Open. Stack Summit 2014 Directory Directory

Filesystem Gateway – No Data Lock In • No lock in due to encoding

Filesystem Gateway – No Data Lock In • No lock in due to encoding with Swift. Stack gateway • Objects Files Data in/out same via Swift API or via CIFS/NFS filesystem • Traditional gateways (like S 3/Glacier, Avere, Panzura) are a 'medieval marriage. . forever‘ • These gateways severely lock data in – all data going in via gateway MUST come out through same gateway • It's a "Hotel California“ for data – you can check-out any time you like, but you can never leave Swift. Stack Filesystem Gateway Swift HTTP APIs Swift Node • Other gateways with lock have other benefits/features • • E. g. deduplication, compression, etc. Or offer POSIX required for some applications 32 Georgia Tech Case Study - Open. Stack Summit 2014 Swift Node

What about our Research Data Curation problem? • Initiative lead by Georgia Tech Library

What about our Research Data Curation problem? • Initiative lead by Georgia Tech Library • Migrate from Dspace to a Research Data Curation repository built around Fedora (repository infrastructure) and Hydra (front end repositories) • Fedora 4. 0 will connect to Swift Ø via JBoss Mode. Shape and Infinispan storage subsystem Ø Infinispan connection to Swift initially to use the Swift 3 (S 3) emulation layer 36 Georgia Tech Case Study - Open. Stack Summit 2014

Swift zones distribution across campus Federated and Distributed Academic cloud: Each zone is located

Swift zones distribution across campus Federated and Distributed Academic cloud: Each zone is located in a server room which is owned and operated by a different GT department…. . No one own the cloud VAPOR ? ? ? Zone ISYE 1 Zone PACE 1 Zone ECS 1 37 Georgia Tech Case Study - Open. Stack Summit 2014 We expect more zones to come on-line in the next 12 months Geographically distant region is on the roadmap (using hosting agreement with other Universities and Internet 2)

What hardware we using? • Supermicro chassis, primarily 24 bays chassis • Hardware configuration

What hardware we using? • Supermicro chassis, primarily 24 bays chassis • Hardware configuration is heterogeneous (different drives capacity, no same number of storage nodes per zone) • Drives are mix of Enterprise / Consumer grade. Mainly 1 TB or 2 TB • Most storage nodes have 10 GB network connectivity (Solar. Flare or Mellanox Connect-X). Currently SFP+ 10 GB… 10 GB Base-T is next • LSI SAS Adapter 9211 -8 i (do not forget to re-flash if needed to change card from Integrated Raid to Integrated Target) • SSDs for Account/Container Ring (60 GB to 120 GB) • Memory to TB ratio? ? 1 GB of memory per TB can be expensive… 38 Georgia Tech Case Study - Open. Stack Summit 2014

Distributed management of our Swift Infrastructure Ø Sysadmin from multiple departments share administrative responsibilities.

Distributed management of our Swift Infrastructure Ø Sysadmin from multiple departments share administrative responsibilities. Ø Would like more delegation granularities down the road: delegation on a per zone / region basis to enable for node management of cluster operator role. Ø Students are a great resource to replace dead drives…. If they know which one to replace…. . 39 Georgia Tech Case Study - Open. Stack Summit 2014

Swift. Stack Auth and LDAP • Initially was considering using AD Integration • LDAP

Swift. Stack Auth and LDAP • Initially was considering using AD Integration • LDAP was a definite requirement. Availability delayed some of our testing / usage. • Georgia Tech LDAP size is fairly large: - ou=accounts -> 300 K entries - ou=people -> 2 M entries • So far so good. Initial integration was easy (5 minutes) but we waited until code was stable…. . • Anyone with a GT valid account can access the Swift cluster !!!! - (but we have not advertised its existence…. Please don’t tell our students) 40 Georgia Tech Case Study - Open. Stack Summit 2014

Our Financial Model Approach Limit recurring cost at all costs !!!! Fund recurring cost

Our Financial Model Approach Limit recurring cost at all costs !!!! Fund recurring cost at the central level (licensing for example) Focus on Bring Your Own: • Zone (BYOZ) • Server (BYOS) • Drive (BYOD) We envision to use HDDs as a form of currency with research groups. 4 TB of data to store = 3 x 4 TB HDD payment for 5 years. Hopefully storing 4 TB will be negligible in 5 years when drives start to die. 41 Georgia Tech Case Study - Open. Stack Summit 2014 √

IT’S ALIVE…. Okay… So What’s Next… • Implement Quotas… probably container based Quotas •

IT’S ALIVE…. Okay… So What’s Next… • Implement Quotas… probably container based Quotas • Re-architect the proxy layer this summer. Dedicated proxy nodes and possibly Load Balancer (Netscaler / F 5) • Might deed to identify High performance NAS Gateway for specific workload ('medieval marriage. . forever’) • Investigate support for a GPFS based Gateway • Keep educating people on long term benefits of using Swift API to access data • Unified access to data via Swift. Stack File. System Gateway (convergence) 42 Georgia Tech Case Study - Open. Stack Summit 2014

Managing Swift with Swift. Stack Joe Arnold CEO Swift. Stack Inc. 43 Georgia Tech

Managing Swift with Swift. Stack Joe Arnold CEO Swift. Stack Inc. 43 Georgia Tech Case Study - Open. Stack Summit 2014

Swift. Stack Object Storage Software Simple, Web-based MANAGEMENT DEPLOY INTEGRATE SCALE Deploy in Minutes

Swift. Stack Object Storage Software Simple, Web-based MANAGEMENT DEPLOY INTEGRATE SCALE Deploy in Minutes not Days Seamlessly Without Disruption Open. Stack Swift (support included) Standard Hardware & Linux Distribution 44 Georgia Tech Case Study - Open. Stack Summit 2014

Questions & Answers 45

Questions & Answers 45