Open Stack NUMA Nova Open Stack Who 1

  • Slides: 24
Download presentation
Open. Stack NUMA技术以及在 Nova中的应用 Open. Stack

Open. Stack NUMA技术以及在 Nova中的应用 Open. Stack

Who? 1 Name • 陈锐 • Rui Chen 2 Employer • Huawei Weibo •

Who? 1 Name • 陈锐 • Rui Chen 2 Employer • Huawei Weibo • @kiwik Blog Open. Stack 3 4 • http: //kiwik. github. io Open. Stack

A. NUMA 目录 B. Nova & NUMA C. Nova & NUMA & ? CONTENTS

A. NUMA 目录 B. Nova & NUMA C. Nova & NUMA & ? CONTENTS Open. Stack

[1] Open. Stack NUMA Non-Uniform Memory Access Open. Stack

[1] Open. Stack NUMA Non-Uniform Memory Access Open. Stack

体系结构 NUMA SMP, NUMA和MPP 1. SMP • Symmetric Multi-Processor 2. NUMA • Non-Uniform Memory

体系结构 NUMA SMP, NUMA和MPP 1. SMP • Symmetric Multi-Processor 2. NUMA • Non-Uniform Memory Access 3. MPP • Massive Parallel Processing Open. Stack

SMP Sharing RAM Open. Stack

SMP Sharing RAM Open. Stack

NUMA Sharing RAM, local faster Open. Stack

NUMA Sharing RAM, local faster Open. Stack

MPP Sharing nothing Open. Stack

MPP Sharing nothing Open. Stack

NUMA Topology Open. Stack

NUMA Topology Open. Stack

[2] Open. Stack Nova & NUMA guest NUMA node placement & topology Open. Stack

[2] Open. Stack Nova & NUMA guest NUMA node placement & topology Open. Stack

Use Case #1 使guest的vcpu和RAM从一个host NUMA node中获取, 避免跨NUMA node的RAM接入,提高 guest性能, 减少不可预知的延时。 例1:hw: numa_nodes=1 Open. Stack

Use Case #1 使guest的vcpu和RAM从一个host NUMA node中获取, 避免跨NUMA node的RAM接入,提高 guest性能, 减少不可预知的延时。 例1:hw: numa_nodes=1 Open. Stack

Use Case #2 当guest的vcpu和RAM超过单个host NUMA node时, 将guest划分为多个 guest NUMA node,并与 host NUMA topology对应,有助于 guest

Use Case #2 当guest的vcpu和RAM超过单个host NUMA node时, 将guest划分为多个 guest NUMA node,并与 host NUMA topology对应,有助于 guest OS感知到guest NUMA, 优化应用资源调度,例如: DB。 例2:hw: numa_nodes=N Open. Stack

What is guest in QEMU/KVM? 1. Guest • Compute. Node上的一个Linux进程 2. v. CPU •

What is guest in QEMU/KVM? 1. Guest • Compute. Node上的一个Linux进程 2. v. CPU • Guest进程内的一个特殊的线程,被host OS调度 How to place a guest into a NUMA node? 1. Schedule v. CPU threads to p. CPU set of a NUMA node 2. Allocate RAM from local Open. Stack

相关配置 flavor & image • hw: numa_nodes=NN – guest NUMA node数量 • hw: numa_mempolicy=preferred|strict

相关配置 flavor & image • hw: numa_nodes=NN – guest NUMA node数量 • hw: numa_mempolicy=preferred|strict – RAM占用方式,Juno代码未实现 • hw: numa_cpus. 0=<cpu-list> - guest NUMA node 0中的vcpu列表 • hw: numa_cpus. 1=<cpu-list> - guest NUMA node 1中的vcpu列表 • hw: numa_mem. 0=<ram-size> - guest NUMA node 0中分配的RAM大小 • hw: numa_mem. 1=<ram-size> - guest NUMA node 1中分配的RAM大小 Open. Stack

Nova实现流程 1 2 • nova-compute • libvirt driver • instance_claim检查 • 根据映射关系,生 host的NUMA资源是 •

Nova实现流程 1 2 • nova-compute • libvirt driver • instance_claim检查 • 根据映射关系,生 host的NUMA资源是 • nova-api • nova-scheduler • 生成guest NUMA • 判断host是否可以满 • topology 足guest NUMA 保存instance_extra topology Workflow 否满足 • 建立guest NUMA到 host NUMA的映射 3 5 • resource tracker • 刷新host numa资源使用情况 成对应的libvirt. xml 4 Nova Open. Stack

Host&Guest NUMA topology stack@openstack: /home/devstack$ [master]$ numactl --hardware available: 2 nodes (0 -1) node

Host&Guest NUMA topology stack@openstack: /home/devstack$ [master]$ numactl --hardware available: 2 nodes (0 -1) node 0 cpus: 4 5 6 7 12 13 14 15 node 0 size: 40232 MB node 0 free: 36833 MB node 1 cpus: 0 1 2 3 8 9 10 11 • 2 NUMA nodes node 1 size: 40318 MB node 1 free: 37822 MB • 40 G RAM per node distances: node 0 1 • CPU&RAM distance 0: 10 20 1: 20 10 Host NUMA Nova Guest NUMA • 8 vcpus • 2 numa nodes • 512 M RAM +--------------------------------------------+ | Property | Value | +--------------------------------------------+ | OS-FLV-DISABLED: disabled | False | | OS-FLV-EXT-DATA: ephemeral | 0 | | disk | 1 | | extra_specs | {"hw: numa_cpus. 0": "0, 1, 2, 3, 4, 5", "hw: numa_cpus. 1": "6, 7", | | "hw: numa_nodes": "2", "hw: numa_mem. 1": "128", | | "hw: numa_mem. 0": "384"} | | id | 11 | | name | chenrui_f | | os-flavor-access: is_public | True | | ram | 512 | | rxtx_factor | 1. 0 | | swap | | | vcpus | 8 | +--------------------------------------------+ Open. Stack

Libvirt. xml <memory unit='Ki. B'>524288</memory> <current. Memory unit='Ki. B'>524288</current. Memory> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin

Libvirt. xml <memory unit='Ki. B'>524288</memory> <current. Memory unit='Ki. B'>524288</current. Memory> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='1' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='2' cpuset='4 -7, 12 -15'/> • <vcpupin vcpu='3' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='4' cpuset='4 -7, 12 -15'/> • <vcpupin vcpu='5' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='6' cpuset='0 -3, 8 -11'/> <vcpupin vcpu='7' cpuset='0 -3, 8 -11'/> </cputune> pin vcpu to pcpu set in cputune export guest numa in cpu/numa <cpu> <topology sockets='8' cores='1' threads='1'/> <numa> <cell cpus='0 -5' memory='393216'/> <cell cpus='6 -7' memory='131072'/> </numa> </cpu> Open. Stack

Double check stack@openstack: ~/logs$ $ sudo numastat -c qemu-system-x 86_64 Per-node process memory usage

Double check stack@openstack: ~/logs$ $ sudo numastat -c qemu-system-x 86_64 Per-node process memory usage (in MBs) PID Node 0 Node 1 Total -------- ----- 7161 (qemu-syste 703 12 714 11839 (qemu-syst 700 13 712 12774 (qemu-syst 162 550 713 15115 (qemu-syst 12 698 711 16379 (qemu-syst 111 5 117 2 17493 (sudo) -------- ----- Total 1689 • numastat • virsh vcpuinfo 1279 2969 stack@openstack: ~/data/nova/instances/3722 ba 51 -cefa-48 e 1 -b 11 e-fd 6 ec 14865 ee$ $ virsh vcpuinfo 10 VCPU: 0 CPU: 4 State: running CPU time: 1. 9 s CPU Affinity: ----yyyy VCPU: State: CPU time: CPU Affinity: 1 15 running 0. 4 s ----yyyy VCPU: State: CPU time: CPU Affinity: 2 5 running 0. 0 s ----yyyy VCPU: State: CPU time: CPU Affinity: 3 12 running 0. 0 s ----yyyy VCPU: State: CPU time: CPU Affinity: 4 13 running 0. 0 s ----yyyy VCPU: State: CPU time: CPU Affinity: 5 14 running 0. 1 s ----yyyy VCPU: State: CPU time: CPU Affinity: 6 2 running 0. 0 s yyyy---- VCPU: State: CPU time: CPU Affinity: 7 0 running 0. 1 s yyyy---- Open. Stack

Where is RAM? <memory unit='Ki. B'>524288</memory> <current. Memory unit='Ki. B'>524288</current. Memory> <vcpu placement='static'>8</vcpu> <cputune>

Where is RAM? <memory unit='Ki. B'>524288</memory> <current. Memory unit='Ki. B'>524288</current. Memory> <vcpu placement='static'>8</vcpu> <cputune> <vcpupin vcpu='0' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='1' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='2' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='3' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='4' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='5' cpuset='4 -7, 12 -15'/> <vcpupin vcpu='6' cpuset='0 -3, 8 -11'/> <vcpupin vcpu='7' cpuset='0 -3, 8 -11'/> </cputune> <cpu> <topology sockets='8' cores='1' threads='1'/> <numa> <cell cpus='0 -5' memory='393216'/> <cell cpus='6 -7' memory='131072'/> </numa> </cpu> • Why? • the kernel will always try to allocate memory from the NUMA node that matches the one the guest CPUs are executing on. Open. Stack

[3] Nova & NUMA & ? Open. Stack I/O NUMA scheduling, vcpu pin, large

[3] Nova & NUMA & ? Open. Stack I/O NUMA scheduling, vcpu pin, large pages, etc. Open. Stack

NFV 1 NUMA-placement https: //blueprints. launchpad. net/nova/+spec/virt-driver-numa-placement 2 v. CPU topology https: //blueprints. launchpad.

NFV 1 NUMA-placement https: //blueprints. launchpad. net/nova/+spec/virt-driver-numa-placement 2 v. CPU topology https: //blueprints. launchpad. net/nova/+spec/virt-driver-vcpu-topology 3 IO based NUMA scheduling https: //blueprints. launchpad. net/nova/+spec/input-output-based-numa-scheduling 4 cpu pinning https: //blueprints. launchpad. net/nova/+spec/virt-driver-cpu-pinning 5 Large pages https: //blueprints. launchpad. net/nova/+spec/virt-driver-large-pages 6 SR-IOV https: //blueprints. launchpad. net/nova/+spec/pci-passthrough-sriov 7 vhost user https: //blueprints. launchpad. net/nova/+spec/vif-vhostuser Open. Stack

Thanks! Open. Stack

Thanks! Open. Stack