Dynamic AWS Server Usage Using Nagios Core or

  • Slides: 27
Download presentation
Dynamic AWS Server Usage Using Nagios Core or How to pay only for what

Dynamic AWS Server Usage Using Nagios Core or How to pay only for what you need Eric Loyd eric@bitnetix. com 877. 33. VOICE @Bitnetix @Smart. Vox

About Bitnetix

About Bitnetix

About Eric Loyd and Bitnetix Founder and CEO of Bitnetix Incorporated Vo. IP services

About Eric Loyd and Bitnetix Founder and CEO of Bitnetix Incorporated Vo. IP services and IT/network consulting Over 25 Years in IT and management at places like Eastman Kodak Frontier Communications / Global Crossing Rochester Institute of Technology Bitnetix started its eighth year in July, 2013 Digital Rochester GREAT Award Finalist in: 2012 for Communications Technology 2013 for Rising Star Using Nagios since 2004 © 2013 Bitnetix Incorporated 3

History of Smart. Vox: Bitnetix’s Vo. IP Platform

History of Smart. Vox: Bitnetix’s Vo. IP Platform

History of Smart. Vox, our Vo. IP Platform Pre-2012 – not yet called Smart.

History of Smart. Vox, our Vo. IP Platform Pre-2012 – not yet called Smart. Vox Bitnetix primarily focused on IT consulting Vo. IP service was ~10% of business with servers located primarily at client sites Custom Asterisk-based servers running Free. PBX We ran customer’s network so we had control over Vo. IP 2012 – Focus switched to Vo. IP Focused now on hosted Vo. IP solutions Made use of Amazon Web Services EC 2 VPS One per customer with no proxies* or media servers Network/bandwidth was only customer responsiblity © 2013 Bitnetix Incorporated 5

History of Smart. Vox, our Vo. IP Platform 2013 – Smart. Vox name born

History of Smart. Vox, our Vo. IP Platform 2013 – Smart. Vox name born Copyright, trademark, domain name, biz cards, etc. Third generation born with multiple proxies, registrars, configuration servers, and media servers June – Started Mission Matrix program & sales AWS architecture leveraged for geography Each customer gets own EC 2 server Proxies to closest zone, secondary “to the west” Media servers located in zones base on number of simultaneous calls, conferences, etc. VMs and CDRs stored in database © 2013 Bitnetix Incorporated 6

Brief Overview of AWS

Brief Overview of AWS

AWS EC 2 Concepts AWS – Amazon Web Services Collection of cloud-based services: Storage

AWS EC 2 Concepts AWS – Amazon Web Services Collection of cloud-based services: Storage (S 3), DNS (Route 53), CDN, Server (EC 2) EC 2 - Elastic Compute Cloud Virtual servers in AWS datacenters (zones) US (3 = VA, CA, OR), EU (1), Asia (3), SA (1) Persistent storage & flexible IP address assignment Pay by the hour that it’s up, storage and bandwidth Spot instances – “temporary” EC 2 servers Bring online as needed, terminated when shut down © 2013 Bitnetix Incorporated 8

AWS EC 2 Costs LOTS of variables, but reasonable potential costs: Reserved servers cost

AWS EC 2 Costs LOTS of variables, but reasonable potential costs: Reserved servers cost about $2. 00 per day Reserved instance pricing is contractual and static, based on size Spot servers cost between $0. 50 -$2. 50 per day Spot instance pricing is dynamic, we assume ~$0. 10 per hour We quantize concurrent calls into 50 -call blocks One media server = 50 calls = 1 spot instance Two media servers = 100 calls = 2 spot instances Bandwidth and storage will add ~10% Reducing AWS usage reduces cost We keep these savings for ourselves. Shhhh!!! © 2013 Bitnetix Incorporated 9

Why Nagios?

Why Nagios?

Why Nagios? Extensive experience using it for clients Bitnetix is a Nagios reseller Needed

Why Nagios? Extensive experience using it for clients Bitnetix is a Nagios reseller Needed centralized monitoring software Integrate with Twitter for notifications Integrate with Eventum via email for trouble tickets Zero cost Framework Leverage SSH, HTTP, check_mk and livestatus!! Custom checks and notifications (very important) Ability to “cookie cutter” installs for AWS © 2013 Bitnetix Incorporated 11

Initial Hurdles Customer Premise Equipment No real control over CPE choices Routers block some

Initial Hurdles Customer Premise Equipment No real control over CPE choices Routers block some traffic, “help” other traffic incorrectly Need to be able to remotely [re-]configure phones Figure out how to “cookie-cutter” EC 2 servers Customer boxes and SIP endpoints Proxies and media servers Wanted to monitor upstream providers as well How to separate apparent from actual failure Something’s broken, but overall service functional © 2013 Bitnetix Incorporated 12

Smart. Vox Provisioning Process and Automation

Smart. Vox Provisioning Process and Automation

Smart. Vox Network DNS SRV records are key to redundant servers Sends incoming calls

Smart. Vox Network DNS SRV records are key to redundant servers Sends incoming calls to one/more border proxies Figures out what customer should receive the calls Border Proxy Provider Border Proxy © 2013 Bitnetix Incorporated Sends the call on to the correct phone/media server (VM, etc) Customer Proxy 14

Provisioning Process Smart. Vox AWS EC 2 Provisioning Database Customer information Account (location/division/etc) information

Provisioning Process Smart. Vox AWS EC 2 Provisioning Database Customer information Account (location/division/etc) information Number of phones*, VM boxes, etc. Computes how many proxies customer needs DNS SRV records created for batch updates Media server/VM entries created automatically Phone provisioning info created automatically Automatically places order for phones* (+some) Phones drop-shipped to customer in about 3 days © 2013 Bitnetix Incorporated 15

AWS EC 2 Automation: Spot Instance API Create spot instance -> gives request ID

AWS EC 2 Automation: Spot Instance API Create spot instance -> gives request ID Instance created with Smart. Vox created base image Wait a bit -> query request ID -> get instance ID Query instance -> get IP address Update DNS with server information and IP Update Nagios with server information and IP When spot instances shut down, they terminate No more expense for “burstable resources” This sounds like a Nagios event handler… © 2013 Bitnetix Incorporated 16

AWS EC 2 Automation: Our Custom Image Smart. Vox media server image includes Asterisk

AWS EC 2 Automation: Our Custom Image Smart. Vox media server image includes Asterisk told to exit after waiting for calls to terminate Startup script shuts down system after Asterisk exits Instant “spot instance” Bring it online when needed, and terminate as required Same basic idea for starting/stopping proxies These tend to be more static than media servers Platform can be adjusted automatically COGS adjusts appropriately Hey, let’s hook this up to Nagios!! © 2013 Bitnetix Incorporated 17

AWS EC 2 Automation: More ideas Quick aside about spot instances. Useful for: Database

AWS EC 2 Automation: More ideas Quick aside about spot instances. Useful for: Database dumps Spot instance turned up to do My. SQL copies Run reports, dump, compress, purge, etc & term Distributing web server load Pop up another server and add to DNS Instant on-demand capacity Anything that you only want to do repeatedly but not for a long time, and only when you want to (or maybe if you have to) © 2013 Bitnetix Incorporated 18

Use Nagios for: Provisioning Monitoring Capacity Planning

Use Nagios for: Provisioning Monitoring Capacity Planning

Provisioning Rather than create EC 2 s, we just update Nagios Automatically regenerate SIP

Provisioning Rather than create EC 2 s, we just update Nagios Automatically regenerate SIP proxy and media server dynamic_hosts. cfg file as part of provisioning process Nagios looks for host up, doesn’t find it, fires off handler Event handler queries EC 2 to see if it’s being turned up (~10 min) or just not running. If it’s not running, it starts it. DNS is batch updated every hour. 59 min TTLs Phone provisioning handled via automatic extract from database to create HTTP served configuration files Master/slave “config servers” (also in AWS) to send all this stuff to customers, with a URL to activate phones Entire process from signature to functional < 1 week © 2013 Bitnetix Incorporated 20

Monitoring Nagios looks for hosts (see previous slide) Automatically creates them if needed Note

Monitoring Nagios looks for hosts (see previous slide) Automatically creates them if needed Note that SIP proxies are not spot instances Dedicated to lifespan of customer/account so they are only terminated as part of de-provisioning process Nagios looks at health of services Determine if we have faults, outages, etc. Can potentially reroute automatically (DNS SRV!) Store performance info for capacity calculations Notifications via Twitter and email Come back tomorrow at 10: 30 for how this works © 2013 Bitnetix Incorporated 21

Capacity Planning Quantize by 50 simultaneous calls per server Perf data used to calculate

Capacity Planning Quantize by 50 simultaneous calls per server Perf data used to calculate historical usage Can use cron to automatically add/remove servers Nagios figures out “deltac” in current usage If deltac = 0, we are just right (OK) If deltac < 0, we have too much capacity (WARN) If deltac > 0, we need more capacity (CRITICAL) Event handler looks at state and either does nothing, tells least used box to stop Asterisk, or adds another box to the mix (see provisioning) Capacity (and costs) dynamically adjust with usage © 2013 Bitnetix Incorporated 22

Capacity Planning: Delta. C deltac – Custom Nagios module Looks at the last three

Capacity Planning: Delta. C deltac – Custom Nagios module Looks at the last three times it ran on particular host Quantized by 50 calls = change in 50 -call volumes If deltac = 0 then we return an OK state If deltac < 0 then we are dropping call volumes and can SSH to a box and tell Asterisk to stop This will then stop the spot instance and reduce cost If deltac > 0 then we are gaining call volumes and trigger provisioning process This will start a spot instance and increase cost © 2013 Bitnetix Incorporated 23

Event Handler: Delta. C

Event Handler: Delta. C

How Delta. C Works Let’s assume we’re creating a new host ec 2 -request-spot-instances

How Delta. C Works Let’s assume we’re creating a new host ec 2 -request-spot-instances ami-58296831 -p 0. 04 --key "BTC EC 2" --group Asterisk --instance-type m 1. medium -n 1 --type one-time Get back a “spot. Instance. Request. Id” (sir-722 f 4 e 34) ec 2 -describe-spot-instance-requests sir-722 f 4 e 34 Get back an “instance. Id” (i-6488 e 31 f) ec 2 -describe-instances i-6488 e 31 f Get back public IP address (ip. Address) of this machine Now we have IP address and (internal) name Populate DNS batch update queue Regenerate /usr/local/nagios/etc/objects/dynamic_hosts. cfg © 2013 Bitnetix Incorporated 25

Delta. C Saves Lives Money Small percentage changes in usage result in large changes

Delta. C Saves Lives Money Small percentage changes in usage result in large changes in Cost Of Goods 5000 calls For example: 2000 calls 500 calls 100 calls • 20 boxes • $2. 00/hour • ~$750/year • 50 boxes • $5. 00/hour • ~$2000/year • 10 boxes • $1. 00/hour • ~$375/year • 2 boxes • $0. 20/hour • ~$75/year © 2013 Bitnetix Incorporated 26

Questions? Eric Loyd eric@bitnetix. com 877. 33. VOICE @Bitnetix @Smart. Vox

Questions? Eric Loyd eric@bitnetix. com 877. 33. VOICE @Bitnetix @Smart. Vox