I am looking how to use MIG with multiple virtual machines (VMs). When we have multiple GPUs, we can attach different GPUs to different instances via PCI passthrough (OpenStack Docs: Attaching physical PCI devices to guests) in openstack and KVM environment. It seems that MIG works fine with Docker but it is not clear if it works with VMs too.
From what I have been told by support you cannot passthrough single MIG instances to VMs. From what I have seen though, it does seem that you can create a VM with a full A100 and then partition that GPU inside the VM using MIG if that is what you are after.
Standard vGPUs seem to work fine in community openstack so hopefully MIG-backed vGPU will eventually be supported. That would be great but I still need to get my hands on some A100 to test that it is actually possible (nvidia docs mention support to RHEL OS/OpenStack, not more generic stuff).
Do ping back if you manage to get something going with MIG :)
Thank you for your reply. Yes, it will be great if MIG-backed is also supported through PCI-passthrough or vGPU. We are still investigating this option.
Recently, we are working on some A100 with Openstack and we are trying to integrate them with our Openstack. In contrast to legacy GPUs where type-PCI is considered in Nova configuration (at the controller and compute), with A100 only type-PF (Physical Function) or type-VF (Virtual Function) with sr-iov should be enabled.
I will post here our finding regarding MIG technology and its integration with cloud environment (Openstack).
Thanks for replying back and wanting to keep posting your findings here :)
You mention that:
In contrast to legacy GPUs where type-PCI is considered in Nova configuration (at the controller and compute), with A100 only type-PF (Physical Function) or type-VF (Virtual Function) with sr-iov should be enabled.
In our openstack cloud we have GPUs like the V100 that are running with type-PCI to enable PCI-passthrough but I am not sure I fully understand what you mean. It seems to me that you are saying that you are not being able to do PCI-passthrough using the A100? Is that it? Could you please elaborate?
No, we have managed to do PCI-passthough of A100 with our openstack platform. In nova configuration at the compute and api nodes, which is the case for any PCI card, you have to enable the PCI-passthrough in filter_scheduler and whitelist the card and define the pci alias. You have to enable IOMMU at the kernel and at the bios in order to allow the PCI address translation between the host and the guest. In contrast to the previous cards, like P100 and V100, where you do not have to specify device_type in their configurations. In these legacy cards, device_type takes the value “type-PCI” by default. In case of A100 as it is sr-iov enabled device, you have to specify either “type-PF” or “type-VF”. Otherwise, the card will be registered as type_PCI and then the nova scheduler will not find any valid hosts to select in the aggregate. Please take a look into the following links:
It is interesting that you say that. The T4 cards are also SR-IOV enabled devices (as far as I can tell from the product brief, page 4, table 3) and yet we did not have to change the device-type to neither “type-PF” nor “type-VF” for these cards to work properly in PCI-passthrough mode. I will definitely be paying attention to this when our A100s arrive though, thanks for sharing.
You are welcome. Good luck with your A100 configuration. Please keep us updated if you manage to assign MIG slices or assign the whole card to the VM without specifying the card type. It could be related to Nova version.
I’m also looking to expose MIG slices (card A30) to Openstack instances, @Antonio_Paulo, did you have any success? Timesliced is working just fine with mdevs but it looks like our use cases would benefit from MIG slicing