Cyborg Project Update Sundar Nadathur Intel IAGS Jinghua
Cyborg Project Update Sundar Nadathur (Intel IAGS) Jinghua Gao(Lenovo Research) Yumeng Bao(ZTE Corporation)
Agenda Cyborg is an Open. Stack project for lifecycle management of accelerators. Motivation for Cyborg • Stein Release Accomplishments • Train Release Plans •
Accelerators: An Emerging Trend • Traditional data center workloads well-suited for Intel® Xeon® processor • • New/emerging workloads are compute-intensive • • Databases, CRM, Mail servers, … Machine Learning, NFV, Image/Video Transcoding, … Data Centers are increasingly using accelerators • GPU, FPGA, Google* TPU, Amazon* Nitro, … *Other names and brands may be claimed as the property of others.
Why Cyborg? PCI whitelists: Difficult to use One PCI ID may correspond to different functions/profiles Device properties not exposed: limits use cases
Why Cyborg? PCI whitelists: Difficult to use One PCI ID may correspond to different functions/profiles Device properties not exposed: limits use cases Cyborg: discover properties automatically Enables the use cases of interest No need for host aggregates to direct scheduling
Cyborg Architecture Cyborg Agent Cyborg Nova Compute Placement Cyborg Driver (Intel OPAE) Glance Neutron User space Hypervisor Kernel space Operator can enable or disable drivers VM Kernel Stack (Intel OPAE) Open. Stack Controller PF VF 0 VF 1 Mgmt Comp-1 Comp-2 Device *Other names and brands may be claimed as the property of others.
Stein Release Accomplishments
Major Features New device model and database schema • Large parts of device discovery are now in place: • Drivers report devices to agent: https: //review. openstack. org/#/c/624138/ • Conductor uses agent reports to update db • • Programming support with v 1 API Added rest API for FPGA programming • Added glance client for Image downloading • Download bitstream when programming • • Driver Support: added GPU driver, enhanced Intel FPGA driver
Pilot for Nova Integration Goals: clarify Nova interaction with working code • Created a pilot branch for Nova integration • • • Pilot code can be folded into master branch in Train • https: //review. opendev. org/#/q/project: openstack/cyborg+topic: cyborg-nova-pilot With corresponding Nova patches • • https: //review. opendev. org/#/q/status: open+project: openstack/nova+bp/nova-cyborg-interaction Shown at the Intel Booth, with a real-world workload
Train Release Plans
Plans Hope to get agreement with Nova on VM creation flow • Deliver major use cases • Device as a Service: for power users • Accelerated Function as a Service: casual users • • Add support for more devices and drivers
Proposed Model for Train Release
Discover Devices Automatically Cyborg Agent Cyborg Nova Compute Placement Cyborg Driver (Intel OPAE) Glance Neutron 1 Operator can enable or disable drivers VM User space Hypervisor Kernel space Kernel Stack (Intel OPAE) Open. Stack Controller PF VF 0 VF 1 Mgmt Comp-1 Comp-2 Device *Other names and brands may be claimed as the property of others.
Upload Bitstreams Cyborg Agent Cyborg Bitstream: Image with metadata Function: Algorithm or VNF (ID/name) Nova Compute Placement Cyborg Driver (Intel OPAE) Glance Neutron User space Hypervisor Kernel space Kernel Stack (Intel OPAE) Open. Stack Controller 2 Operator or 3 rd party may upload bitstreams VM PF VF 0 VF 1 Mgmt Comp-1 Comp-2 Device *Other names and brands may be claimed as the property of others.
Define Device Profiles Cyborg Agent Cyborg Device profile: flavor for devices Nova Compute Placement Cyborg Driver (Intel OPAE) Glance Neutron User space Hypervisor Kernel space 3 Operator defines device profiles VM Kernel Stack (Intel OPAE) Open. Stack Controller PF {"name": “genomics-dp", "groups": [ {"resources: CUSTOM_ACCELERATOR_FPGA": "1", Mgmt "trait: CUSTOM_FPGA_INTEL_PAC_ARRIA 10": "required“, “trait: CUSTOM_FUNCTION_ID_6512“: “required” } ]} VF 0 VF 1 Comp-2 Device *Other names and brands may be claimed as the property of others.
Device Profiles: Use Cases {"name": "nlb-0", # FPGA as a Service Future: use names "groups": [ {"resources: CUSTOM_ACCELERATOR_FPGA": "1", "trait: CUSTOM_REGION_TYPE_9926 AB 6 D 6 C 925 A 68 AABCA 7 D 84 C 545738": "required“, "accel: bitstream_id": "65127 d 8 f-66 d 6 -4767 -b 996 -273656 eb 0 d 5 f", } ] } Glance UUID {"name": "nlb-0", # Accelerated Function as a Service, Orch-programmed "groups": [ {"resources: CUSTOM_ACCELERATOR_FPGA": "1", "trait: CUSTOM_FPGA_INTEL_PAC_ARRIA 10": "required“, "accel: function_id": "65127 d 8 f-66 d 6 -4767 -b 996 -273656 eb 0 d 5 f", } ]}
Define Flavors Cyborg Agent Cyborg Nova Compute Placement Cyborg Driver (Intel OPAE) Glance Neutron User space Hypervisor Kernel space 4 Operator sets device profile in a flavor VM Kernel Stack (Intel OPAE) Open. Stack Controller $ openstack flavor set --property “accel: device_profile_name=mydp” myflavor PF VF 0 VF 1 Mgmt Comp-1 Comp-2 Device *Other names and brands may be claimed as the property of others.
VM Creation Cyborg Agent Cyborg Nova Compute Placement Cyborg Driver (Intel OPAE) Glance Neutron User space Hypervisor Kernel space End user requests VM with a flavor VM Kernel Stack (Intel OPAE) Open. Stack Controller PF VF 0 VF 1 Mgmt Comp-1 Comp-2 Device *Other names and brands may be claimed as the property of others.
Welcome to Cyborg! • https: //wiki. openstack. org/wiki/Cyborg • https: //wiki. openstack. org/wiki/Meetings/Cyborg. Team. Meeting • IRC: #openstack-cyborg • Email: openstack-discuss ML, with [cyborg] in subject You are welcome to participate and contribute!
Thank you
Nova Cyborg Flow: Part 1 of 2 Placement Nova Controller Cyborg openstack server create GET /device_profiles? name=mydp • Operator creates a device profile with Cyborg API • Operator sets DP name in flavor, as an extra spec {“device_profiles”: {…}} Merge request groups in request spec GET /alloc-cand? resources=… {“allocations”: {…}} Select a host
Nova Cyborg Flow: Part 2 of 2 Nova Controller Nova Virt Driver Cyborg POST /accelerator_requests Create unbound ARQs {“device_profile_name”: “mydp”} {“arqs”: {…}} Match ARQs to RGs to RPs Asynchronously bind ARQs to host/device-RP/instance; FPGA PATCH /accelerator_requests {$arq_uuid: [$patch 1, …$patch. N]}} programming may happen here. build_and_run_instance GET /accelerator_requests? instance=uuid&bind_state=resolved {“arqs”: {“attach_handle”: …}} Merge into list of PCI devices Compose XML (unchanged) Spawn VM (unchanged)
Device Model Attach Handles Control. Path ID Accelerators Term Deployables (Resource Providers) Device Meaning Placement Equiv Deployable A logical structure in a device that provides a resource. A resource can be an accelerator, local memory, etc. Resource Provider Accelerator A logical resource to offload computation, etc. Unit of assignment to an instance. Resource Class Inventory Device Physical hardware. E. g. PCI card. Includes board (Flash/BMC). -- Control. Path ID Control Plane interface ID to access the device. E. g. PCI PF. -- Attach Handle An ID of the handle to attach to an instance. E. g. PCI VF, mdev UUID. --
- Slides: 23