Datacenter Management with Apache Mesos mesos apache org
- Slides: 142
Datacenter Management with Apache Mesos mesos. apache. org @Apache. Mesos Benjamin Hindman – @benh
I’ve got tons of data. . .
… more everyday!
That must be why they call it a datacenter.
I’d love to answer some questions with the help of my data!
I think I’ll try Hadoop.
your datacenter
+ Hadoop
happy?
Not exactly …
… Hadoop is a big hammer, but not everything is a nail!
I’ve got some iterative algorithms, I want to try Spark!
datacenter management
datacenter management
datacenter management
static partitioning
Oh noes! Spark wants to read and write data to HDFS!
Hadoop … (map/reduce) (distributed file system)
HDFS
HDFS
Could we just give Spark it’s own HDFS cluster too?
HDFS
HDFS
HDFS
HDFS tee incoming data (2 copies)
HDFS tee incoming data (2 copies) periodic copy/sync
That sounds annoying … let’s not do that. Can we do any better though?
HDFS
HDFS
HDFS
happy now?
No! We’ve decided to start doing real time computation with Storm …
datacenter management
datacenter management
happy now!?
Not really … during the day I’d rather give more machines to Spark but at night I’d rather give more machines to Hadoop!
datacenter management
datacenter management
datacenter management
datacenter management
And failures require more datacenter management!
datacenter management
datacenter management
datacenter management
I don’t want to deal with this!
the datacenter … rather than think about the datacenter like this …
… is a computer think about it like this …
datacenter computer applications resources filesystem
mesos applications kernel resources filesystem
Okay, so how does it work?
Step 1: HDFS
Step 2: Mesos run a “master” (or multiple for high availability)
Step 2: Mesos run “slaves” on the rest of the machines
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
Step 3: Frameworks
$tep 4: Profit
$tep 4: Profit (utilize) just one big pool of resources, utilize single machines more fully!
$tep 4: Profit (utilize)
$tep 4: Profit (utilize)
$tep 4: Profit (utilize)
$tep 4: Profit (utilize)
$tep 4: Profit (utilize)
$tep 4: Profit (statistical multiplexing)
$tep 4: Profit (statistical multiplexing)
$tep 4: Profit (statistical multiplexing)
$tep 4: Profit (statistical multiplexing)
$tep 4: Profit (statistical multiplexing)
$tep 4: Profit (statistical multiplexing) reduces Cap. Ex and Op. Ex!
$tep 4: Profit (statistical multiplexing) reduces latency!
$tep 4: Profit (statistical multiplexing)
$tep 4: Profit (failures)
$tep 4: Profit (failures)
$tep 4: Profit (failures)
This sounds pretty good!
Other than Hadoop, Spark, and Storm, what else can I run on Mesos?
frameworks • Hadoop (github. com/mesos/hadoop) • Spark (github. com/mesos/spark) • DPark (github. com/douban/dpark) • Storm (github. com/nathanmarz/storm) • Chronos (github. com/airbnb/chronos) • MPICH 2 (in mesos git repository) • Aurora (proposed for Apache incubator)
What about XYZ?
port an existing framework strategy: write a “wrapper” which launches existing components on mesos ~100 lines of code to write a wrapper (the more lines, the more you can take advantage of elasticity or other mesos features) see src/examples/ in mesos repository
write a new framework! as a “kernel”, mesos provides a lot of primitives that make writing a new framework relatively easy primitives: extracted commonality across existing distributed systems/frameworks (launching tasks, doing failure detection, etc) … why re-implement them each time!?
case study: chronos distributed cron with dependencies developed at airbnb ~3 k lines of Scala! distributed, highly available, and fault tolerant without any network programming! http: //github. com/airbnb/chronos
Hmm … if Mesos gives me a datacenter computer … can I run stuff other than analytics?
case study: aurora run N instances of my server, somewhere, forever (where server == arbitrary command line) developed at Twitter runs hundreds of production services, including ads! recently proposed for Apache Incubator!
aurora
aurora
aurora
aurora
aurora
But what about resource isolation!? I don’t want my end users to have to wait for our website to load because of resource contention!
resource isolation Linux control groups (cgroups) CPU (upper and lower bounds) memory network I/O (traffic controller) filesystem (lvm, in progress)
conclusions datacenter management is a pain
conclusions mesos makes running frameworks on your datacenter easier as well as increasing utilization and performance while reducing Cap. Ex and Op. Ex!
conclusions rather than build your next distributed system from scratch, consider using mesos
conclusions you can share your datacenter between analytics and online services!
Questions? mesos. apache. org @Apache. Mesos
framework commonality run processes simultaneously (distributed) handle process failures (fault-tolerance) optimize execution (elasticity, scheduling)
primitives scheduler – distributed system “master” or “coordinator” (executor – lower-level control of task execution, optional) requests/offers – resource allocations tasks – “threads” of the distributed system …
scheduler Apache Hadoop Chronos
scheduler (1) brokers for resources (2) launches tasks (3) handles task termination
brokering for resources (1) make resource requests 2 CPUs 1 GB RAM slave * (2) respond to resource offers 4 CPUs 4 GB RAM slave foo. bar. com
offers: non-blocking resource allocation exist to answer the question: “what should mesos do if it can’t satisfy a request? ” (1) wait until it can (2) offer the best allocation it can immediately
offers: non-blocking resource allocation exist to answer the question: “what should mesos do if it can’t satisfy a request? ” (1) wait until it can (2) offer the best allocation it can immediately
resource allocation Apache Hadoop Chronos request
resource allocation Apache Hadoop Chronos request allocator dominant resource fairness resource reservations
resource allocation Apache Hadoop Chronos request allocator dominant resource fairness resource reservations pessimistic optimistic
resource allocation Apache Hadoop request Chronos allocator dominant resource fairness resource reservations pessimistic no overlapping offers optimistic all overlapping offers
resource allocation Apache Hadoop Chronos offer allocator dominant resource fairness resource reservations
“two-level scheduling” mesos: controls resource allocations to framework schedulers: make decisions about what to run given allocated resources
end-to-end principle “application-specific functions ought to reside in the end hosts of a network rather than intermediary nodes”
tasks either a concrete command line or an opaque description (which requires a framework executor to execute) a consumer of resources
task operations launching/killing health monitoring/reporting (failure detection) resource usage monitoring (statistics)
resource isolation cgroup per executor or task (if no executor) resource controls adjusted dynamically as tasks come and go!
case study: chronos distributed cron with dependencies built at airbnb by @flo
before chronos
before chronos single point of failure (and AWS was unreliable) resource starved (not scalable)
chronos requirements fault tolerance distributed (elastically take advantage of resources) retries (make sure a command eventually finishes) dependencies
chronos leverages the primitives of mesos ~3 k lines of scala highly available (uses Mesos state) distributed / elastic no actual network programming!
after chronos
after chronos + hadoop
case study: aurora “run 200 of these, somewhere, forever” built at Twitter
before aurora static partitioning of machines to services hardware outages caused site outages puppet + monit ops couldn’t scale as fast as engineers
aurora highly available (uses mesos replicated log) uses a python DSL to describe services leverages service discovery and proxying (see Twitter commons)
after aurora power loss to 19 racks, no lost services! more than 400 engineers running services largest cluster has >2500 machines
Mesos Hadoop Spark MPI Storm Chronos Mesos Node Node Node
Mesos Hadoop Spark MPI … Mesos Node Node Node
Mesos Hadoop Spark MPI Storm … Mesos Node Node Node
Mesos Hadoop Spark MPI Storm Chronos … Mesos Node Node Node
- Datacenter management suite
- Mesos potamos
- Micro datacenter
- Converged datacenter
- Shelternos
- Datacenter
- Datacenter
- Microsoft datacenter tour
- Exchange datacenter switchover
- Servicios de alojamiento datacenter
- Datacenter fabric
- Datacenter basics
- Jcc pointstreak
- Org.apache.xpath.xpathapi
- Apache identity management
- Top management and middle management
- Management pyramid
- Top management middle management first line management
- Jax.org, colony management
- Apacheathletics
- Is tomcat a servlet container
- Ski apache ski patrol
- Clasificacion bisap
- Nutch tutorial
- Jena eclipse
- Lucene tutorial
- Apache spark concepts
- Flash web server
- Apache sling competitors
- Apache c10k
- Apache http server architecture
- Onnx vs tvm
- Apache traffic server load balancing
- Apache tomcat container
- Apache storm vs kafka
- Apache spot
- Apache spark challenges
- Apache performance tuning windows
- Apache ofbiz architecture
- Hive
- Apache derby vs sqlite
- Mapreduce open source
- Sentiment analysis with .net for apache spark
- Tjc apache access
- Apache spark presentation
- Integrao
- Cassandra stein
- Apache hive data warehouse
- Mahadev konar
- Apache traffic control
- Apache tiles
- Apache performance tuning
- Apache kafka message queue
- Architecture as measure
- Apache lozen
- Bombardier and train and ("data ingestion" or "apache")
- Apache powder company
- Apache spark tutorial python
- Apache tomcat digitalization
- Apache tribe transportation
- Tomcat ee
- Allreduce
- Introduction to struts framework
- Apache pytarch
- Cdh 설치
- Apache traffic server
- Apache proxy protocol
- Apache mxnet operating system
- Un saludo de apache un saludo de capitan
- Variaciones del adjetivo
- Cayenne apache
- Qapache
- Apache hadoop is an open source product
- Clasificacion pancreatitis
- Apache lucene elasticsearch
- Apache mahout
- Hpc big data
- Flume apache
- Hortonworks data catalog
- Apache jena examples
- Mongodb university
- Cocoon apache
- Apache wicket examples
- Apache web framework
- Hdinsight on apache kafka
- Zeppelin ldap
- Starbucks apache junction
- Criteri di atlanta pancreatite acuta
- Apache girl's rite of passage answer key
- Rails webrick
- Potatis minerva
- Pancreatitis aguda bisap
- Apache tiles
- Update sql command
- Spark context
- Apache mahout
- Apache fop
- Apache drill
- Apache directory services
- Geronimo application server
- Lucene autocomplete
- Apache atlas architecture
- Collectd to hdfs
- Apache authldapurl
- Velocity template language
- Apache tomcat
- Apache spark vs spring
- Apache commons exec
- Introduction to apache spark
- 1php
- Time management human resources
- Performance management in network management
- Asset management vs project management
- Configuration management in project management
- What is crm in supply chain management
- Retail human resources
- Principles of cost management
- Scientific management vs administrative management
- Accounting management in network management
- Defintion of hrm
- Tahapan sales cycle
- Quality management in operations management
- Traditional management vs entrepreneurial management
- Contemporary management practices
- Sequence of supply
- Define software configuration management
- Operations management chapter 12 inventory management
- Hod management plan template
- Introduction to software project management
- Demand management and capacity management
- What is project integration management
- Introduction to network management
- Accounting management in network management
- Operations management with total quality management book
- Atributos divinos
- Unanimes.org
- Unanimes.org
- Project.teachingenglish.org.uk
- Studymafia
- Www.myrotary.org
- Www.lease-advice.org
- Itcilo.org
- Www.hsfpp.org