Wigner Datacenters New Software Defined Datacenter Architecture HEPIX

  • Slides: 24
Download presentation
Wigner Datacenter’s New Software Defined Datacenter Architecture HEPIX 2017 Fall, Tsukuba 2017. 10. 19.

Wigner Datacenter’s New Software Defined Datacenter Architecture HEPIX 2017 Fall, Tsukuba 2017. 10. 19. Zoltan Szeleczky IT Engineer Wigner Datacenter 1

Introduction • Wigner Datacenter is part of Wigner Research Center for Physics (Wigner RCP),

Introduction • Wigner Datacenter is part of Wigner Research Center for Physics (Wigner RCP), which belongs to the Hungarian Academy of Sciences (MTA) • Tier-0 hosting site for CERN • Academic Cloud for the scientific community: – 4000 VCPUs – 1. 6 PB Storage – 1. 6 PB Tape backup 2017. 10. 19. 2

Current Production State • • Legacy Cloud Open. Stack Kilo Installed manually Instead of

Current Production State • • Legacy Cloud Open. Stack Kilo Installed manually Instead of manual upgrade, we designed a new architecture with an automated cloud deployment 2017. 10. 19. 3

New Architecture 2017. 10. 19. 4

New Architecture 2017. 10. 19. 4

Toolset • Dev. OPS using Git. Lab, Gerrit and Jenkin • OVirt virtualized HA

Toolset • Dev. OPS using Git. Lab, Gerrit and Jenkin • OVirt virtualized HA infrastructure – – – Free. IPA: Identity management, Kerberos, LDAP, DNS Katello (+Puppet master): Configuration & life cycle management Undercloud • Open. Stack deployment using Triple. O • Ops. Tools (integrated with Triple. O) – – – Availibility Monitoring: Sensu, Redis, Uchiwa Log collection: Fluentd, Elasticsearch, Kibana Performance monitoring: Collectd, Grafite, Grafana • Network automation + Ia. C (Infrastructure as Code) with Puppet (plan) • Infrastructure monitoring: Morpheus (plan) • Management / user platform: Manage. IQ (plan) 2017. 10. 19. 5

Triple. O • Open. Stack on Open. Stack • Using a deployment cloud (Undercloud)

Triple. O • Open. Stack on Open. Stack • Using a deployment cloud (Undercloud) to create and manage a workload cloud (Overcloud) 2017. 10. 19. 6

Triple. O network layout 2017. 10. 19. 7

Triple. O network layout 2017. 10. 19. 7

Automated way of adding servers to the Overcloud 2017. 10. 19. 8

Automated way of adding servers to the Overcloud 2017. 10. 19. 8

Instackenv 2017. 10. 19. 9

Instackenv 2017. 10. 19. 9

Available nodes 2017. 10. 19. 10

Available nodes 2017. 10. 19. 10

Yaml files describing the enviroment • The Overcloud is deployed using heat • describe

Yaml files describing the enviroment • The Overcloud is deployed using heat • describe the environment using yaml parameters nodes. yaml: define compute & store count 2017. 10. 19. 11

New features and upgrade plan • 3 step process – Development environment (small, 3

New features and upgrade plan • 3 step process – Development environment (small, 3 node) – Test environment (medium, 9 nodes) – Production environment 2017. 10. 19. 12

Firewall • Opn. Sense is a Free. BSD based open source firewall – Integrated

Firewall • Opn. Sense is a Free. BSD based open source firewall – Integrated Suricata – Integrated Open. VPN – Integrated Time Server • Problems: – Lacks API support for automation – Port configuration turns off all ports – Pfsense code could use a rework • We are still looking for an alternative solution that can be better automated, any suggestions? 2017. 10. 19. 13

2 FA / Yubikey • Two factor authentication to increase security • Supports NFC

2 FA / Yubikey • Two factor authentication to increase security • Supports NFC • Integrated with Free. IPA – Free. Radius • Used for secure VPN connection 2017. 10. 19. 14

Progress so far • Fully virtualized infrastructure – OVirt on 3 hosts – Katello

Progress so far • Fully virtualized infrastructure – OVirt on 3 hosts – Katello – Free. IPA • Working Dev and Test environment with new features added and tested continuously • Free. IPA Integration to Overcloud nodes & to keystone – Tried novajoin, it didn’t register IP addresses correctly – Wrote custome script instead • Still a lot of work left to do…. . 2017. 10. 19. 15

Problems we currently face • Overcloud Metadata VIP not working • Power outage in

Problems we currently face • Overcloud Metadata VIP not working • Power outage in the test environment – UPS failure dedicated to our test system – Free. IPA database corruption -> reinstall – Free. IPA replicas • Overcloud cert resubmit loop • Frequent bugs in the Triple. O stable repository 2017. 10. 19. 16

Thank you! Questions? If you have any ideas or suggestions we would be happy

Thank you! Questions? If you have any ideas or suggestions we would be happy to hear it. Email: szeleczky. zoltan@wigner. mta. hu 2017. 10. 19. 17

Extra / Backup Slides 2017. 10. 19. 18

Extra / Backup Slides 2017. 10. 19. 18

Availability Monitoring 2017. 10. 19. 19

Availability Monitoring 2017. 10. 19. 19

Performance Monitoring 2017. 10. 19. 20

Performance Monitoring 2017. 10. 19. 20

Logging 2017. 10. 19. 21

Logging 2017. 10. 19. 21

OVirt o. Virt is an open source virtual datacenter platform, built on the foundation

OVirt o. Virt is an open source virtual datacenter platform, built on the foundation of the Linux KVM hypervisor. It’s the opensource equivalent of RHEV. It provides high availability and an easy way to solve the chicken-egg problem. We use it to virtualize our infrastructure services such as: • Katello: Lifecycle management • Free. IPA: SSO; Security information management solution • Triple. O undercloud 2017. 10. 19. 22

Katello / Foreman Manage servers throughout their lifecycle, from provisioning and configuration to orchestration

Katello / Foreman Manage servers throughout their lifecycle, from provisioning and configuration to orchestration and monitoring. It’s the opensource equivalent of RH Satellite. • Discover new servers, inventory • Manage physical and virtual servers • Supports Puppet and Ansible • Local yum repo • Openscap security audits Starting point for developing automated processes, and also has GUI for convenience. 2017. 10. 19. 23

Free. IPA • SSO for users, systems, services • LDAP / Kerberos authentication •

Free. IPA • SSO for users, systems, services • LDAP / Kerberos authentication • Has replication functions to ensure HA • We use it to manage our users, hosts and services securely with certificates 2017. 10. 19. 24