Cloud Computing with Nimbus FNAL January 2009 Kate
Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs. anl. gov) University of Chicago Argonne National Laboratory
Cloud Computing Elastic computing, Pay-as-you-go, Capital expense Science Clouds operational expense 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
Everything-as-a-Service Saa. S Paa. S Iaa. S 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
The Quest Begins 10/20/08 l Code complexity l Resource control The Nimbus Toolkit: http//workspace. globus. org
“Workspaces” l l 10/20/08 Dynamically provisioned environments u Environment control u Resource control Hardware implementations vs virtualization The Nimbus Toolkit: http//workspace. globus. org
A Brief History of Nimbus STAR production runs on EC 2 Xen released 2003 Research on agreement-based services 10/20/08 EC 2 goes online 2006 First Workspace Service release Nimbus Cloud comes online 2009 EC 2 gateway available The Nimbus Toolkit: http//workspace. globus. org Support for EC 2 interfaces
Nimbus Overview l Goal: open source, extensible, Iaa. S implementation and tools u u u l Tools u u l 10/20/08 Specifically targeting scientific community A platform for experimentation with features for scientific needs Set up private clouds (privacy, expense considerations) Iaa. S layer (Workspace Service) Orchestration layer (Context Broker, gateway) http: //workspace. globus. org/ The Nimbus Toolkit: http//workspace. globus. org
The Workspace Service VWS Service 10/20/08 Pool node Pool node Pool node The Nimbus Toolkit: http//workspace. globus. org
The Workspace Service The workspace service publishes information on each workspace as standard WSRF Resource Properties. VWS Service Users can query those properties to find out information about their workspace (e. g. what IP the workspace was bound to) Users can interact directly with their workspaces the same way the would with a physical machine. 10/20/08 Pool node Pool node Pool node Trusted Computing Base (TCB) The Nimbus Toolkit: http//workspace. globus. org
Workspace Service Interfaces and Clients l l Web Services based Web Service Resource Framework (WSRF) u l Elastic Computing Cloud (EC 2) u u l Supported: ec 2 -describe-images, ec 2 -run-instances, ec 2 describe-instances, ec 2 -terminate-instances, ec 2 -rebootinstances, ec 2 -add-keypair, ec 2 -delete-keypair Unsupported: availability zones, security groups, elastic IP assignment, REST Used alongside WSRF interfaces u 10/20/08 GT-based E. g. , the University of Chicago cloud allows you to connect via the cloud client or via the EC 2 client The Nimbus Toolkit: http//workspace. globus. org
Security l l GSI authentication and authorization u PKI credential required u Works with Grid proxies u VOMS, Shibboleth (via Grid. Shib), custom PDPs Secure access to VMs u l Validating images and image data u 10/20/08 EC 2 key generation or accessed from. ssh Collaboration with Vienna University of Technology The Nimbus Toolkit: http//workspace. globus. org
Networking l Network configuration u u l Internal: private network via a local cluster network Each VM can specify multiple NICs mixing private and public networks (WSRF only) u 10/20/08 External: public IPs or private IPs (via VPN) E. g. , cluster worker nodes on a private network, headnode on both public and private network The Nimbus Toolkit: http//workspace. globus. org
The Back Story Workspace WSRF front-end that allows clients to deploy and manage virtual workspaces VWS Service Pool node Pool node Pool node Workspace back-end: Resource manager for a pool of physical nodes Deploys and manages Workspaces on the nodes Each node must have a VMM (Xen) installed, as well as the workspace control program that manages individual nodes Trusted Computing Base (TCB) 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
EC 2 WSRF Workspace Components workspace service workspace resource manager workspace control workspace pilot workspace client 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
Workspace Control l l VM image propagation Image management and reconstruction u l VM control u u l l 10/20/08 Starting, stopping, pausing, etc. Integrating a VM into the network u l Creating blank partitions, sharing partitions Assigning MAC addresses and IP addresses DHCP delivery tool Building up a trusted (non-spoofable) networking layer Contextualization information management Talks to the workspace service via ssh Standalone component Some functionality overlap with libvirt Implementations in Xen and KVM (queued up for release) The Nimbus Toolkit: http//workspace. globus. org
The Workspace Resource Manager l l l Basic slot fitting Implements “immediate leases” Extensible vehicle to experiment with different leases Open source resource manager for multiple different VMMs Datacenter technology equivalent u l Deployment u 10/20/08 Can be replaced by Open. Nebula or other datacenter technologies University of Chicago, University of Florida, Purdue, Masaryk University and all the other Science Cloud sites The Nimbus Toolkit: http//workspace. globus. org
The Workspace Pilot l l Challenge: how can I provide a virtualization solution without disrupting the current operation of my cluster? Flying Low: the Workspace Pilot u u l Deployment u u 10/20/08 Integrates with popular LRMs (such as PBS, SGE) Implements “best effort” leases Glidein approach: submits a “pilot” program that claims a resource slot Includes administrator tools Testing @ U of Victoria (Atlas), Ian Gable and collaborators Adapting for the use of the Atlas experiment @ CERN, Omer Khalid The Nimbus Toolkit: http//workspace. globus. org
Cloud Closure EC 2 WSRF storage service workspace service cloud client 10/20/08 workspace resource manager workspace control workspace pilot workspace client The Nimbus Toolkit: http//workspace. globus. org
Iaa. S Gateway l Goals u u l l l 10/20/08 Access to different Iaa. S infrastructures Account management Facilitate movement between academic and commercial clouds and creation of meta-clouds Combine higher-level tools and Iaa. S Released as service, not as code First online in June 2007, currently in a rewrite Used to move e. g. , HEP STAR experiments between Science Clouds and EC 2 The Nimbus Toolkit: http//workspace. globus. org
The Iaa. S Gateway EC 2 WSRF storage service workspace service Iaa. S gateway cloud client 10/20/08 workspace resource manager workspace control workspace pilot EC 2 potentially other providers workspace client The Nimbus Toolkit: http//workspace. globus. org
One-click Virtual Clusters l Parameterizable appliance l Tightly-coupled clusters IP 1 HK 1 IP 2 HK 2 IP 3 HK 3 MPI Reciprocal exchange of information: networking and security 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
Context Broker IP 1 HK 1 IP 2 IP 1 HK 2 HK 1 HK 3 HK 1 IP 1 HK 1 IP 2 HK 2 IP 1 IP 2 IP 1 HK 2 IP 1 IP 3 HK 3 IP 1 IP 3 IP 1 HK 3 IP 1 Context Broker 10/20/08 IP 3 IP 1 The Nimbus Toolkit: http//workspace. globus. org
Goals for Context Broker l Can work with every appliance u l Can work with every cloud provider u l 10/20/08 Appliance schema, can be implemented in terms of many configuration systems Simple and minimal conditions on generic context delivery Can work across multiple cloud providers, in a distributed environment The Nimbus Toolkit: http//workspace. globus. org
Status for Context Broker l Release history: u In alpha testing since August ‘ 07 u First released summer July ‘ 08 (v 1. 3. 3) u Latest update January ‘ 09 (v 2. 2) l Used to contextualize 100 s of nodes for EC 2 STAR runs l Contextualized images on workspace marketplace l Working with r. Path to make contextualizatin easier for the user 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
End of Nimbus Tour EC 2 context broker WSRF storage service Iaa. S gateway context client 10/20/08 workspace service cloud client workspace resource manager workspace control workspace pilot EC 2 potentially other providers workspace client The Nimbus Toolkit: http//workspace. globus. org
Science Clouds l Make it easy for scientific projects to experiment with cloud computing u l Evolve software in response to the needs of scientific projects u u 10/20/08 Can cloud computing be used for science? Start with EC 2 -like functionality and evolve to serve scientific projects: virtual clusters, diverse resource leases Federating clouds: moving between cloud resources in academic and commercial space The Nimbus Toolkit: http//workspace. globus. org
Science Cloud Resources l University of Chicago (Nimbus): u u l University of Florida u u l l Online since 05/08 16 -32 nodes, access via VPN Other Science Clouds u l first cloud, online since March 4 th 2008 16 nodes of UC Tera. Port cluster, public IPs Masaryk University, Brno, Czech Republic (08/08), Purdue (09/08) Installations in progress: IU, Grid 5 K, others Using EC 2 for overflow Minimal governance model http: //workspace. globus. org/clouds 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
Cloud Use l ~100 DNs l Utilization: u u l Overall: 16% Peak pw: 86% (week of 7/14) Requests rejected: u None untill 7/14 u Lots afterwards ; -) Data scaled to the number of days 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
Who Runs on Nimbus? Project diversity: Science, CS, education, build&test… 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
Hadoop over Many. Clouds U of Florida U of Chicago Vi. NE router l l 10/20/08 Vi. NE router CS research: investigate latency-sensitive apps, e. g. Hadoop Need access to distributed resources, and high level of privilege to run a Vi. NE router Virtual workspace: Vi. NE router + application VMs Paper: “Cloud. BLAST: Combining Map. Reduce and Virtualization on Distributed Resources for Bioinformatics Applications” by Andréa Matsunaga, Maurício Tsugawa and José Fortes. e. Science 2008. The Nimbus Toolkit: http//workspace. globus. org
Alice HEP Experiment at CERN l CHEP paper in preparation 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
STAR l l STAR: a high-energy physics experiment Need resources with the right configuration u u l Complex environments: correct versions of operating systems, libraries, tools, etc all have to be installed. Consistent environments: require validation A virtual OSG STAR cluster u OSG cluster l u l STAR worker nodes: SL 4 + STAR conf Requirements u u 10/20/08 OSG CE (headnode), gridmapfiles, host certificates, NSF, PBS One-click virtual cluster deployment Migration: Science Clouds -> EC 2 The Nimbus Toolkit: http//workspace. globus. org
STAR (cntd) l From proof-of-concept to production runs u u u l Performance u l 10/20/08 ~2 years ago: proof-of-concept Last September: EC 2 runs of up to 100 nodes (production scale, non-critical codes) Testing for critical production deployment Within 10% of expected performance for applications Work by Jerome Lauret, Doug Olson, Leve Hajdu, Lidia Didenko The Nimbus Toolkit: http//workspace. globus. org
Scalability Testing l Motivation u u l Workspaces u l l l 10/20/08 Globus 101 + others Requirements u l Test scalability of various Globus components Test on a different platforms very short-term but flexible access to diverse platforms Work by various members of the Globus community (Tom Howe and John Bresnahan) Resulted in provisioning a private cloud for Globus Typically very short-lived communities of one The Nimbus Toolkit: http//workspace. globus. org
Montage Workflows l Evaluating a cloud from user’s perspective u 10/20/08 Paper: “Exploration of the Applicability of Cloud Computing to Large-Scale Scientific Workflows”, C. Hoffa, T. Freeman, G. Mehta, E. Deelman, K. Keahey, SWBES 08: Challenging Issues in Workflow Applications The Nimbus Toolkit: http//workspace. globus. org
Cloud Computing Ecosystem Appliance Providers marketplaces commercial providers communities Deployment Orchestrator orchestrate the deployment of environments across possibly many cloud providers User Environments VMM/datacenter/Iaa. S 10/20/08 The Nimbus Toolkit: http//workspace. globus. org
Open Source Iaa. S Implementations l Open. Nebula u u l Eucalyptus u u l u Open source implementation of EC 2 Monash University, Me. Ss. AGE Lab, 01/2009 Industry efforts u 10/20/08 Open source implementation of EC 2 UCSB, R. Wolski & team, 06/2008 Cloud-enabled Nimrod-G u l Open source datacenter implementation University of Madrid, I. Llorente & team, 03/2008 open. QRM, Enomalism The Nimbus Toolkit: http//workspace. globus. org
Friends and Family l l l Committers: Kate Keahey & Tim Freeman (ANL/UC), Ian Gable (UVIC) A lot of help from the community, see: http: //workspace. globus. org/people. html Collaborations: u u u 10/20/08 Cumulus: S 3 implementation (Globus team) EBS implementation with IU Appliance management: r. Path and Bcfg 2 project Virtual network overlays: University of Florida Security: Vienna University of Technology The Nimbus Toolkit: http//workspace. globus. org
To the Future and Beyond l Increasing Importance of Appliance Providers l Cloud computing tools l Increased interest in cloud interoperability u u l 10/20/08 Standards: “rough consensus & working code” Image formats, contextualization capabilities, cloud interfaces, etc. Cloud markets The Nimbus Toolkit: http//workspace. globus. org
- Slides: 39