K80 P2P memory in VM

ar1stotl3 · May 7, 2019, 12:57am

I’m still trying to get my K80s working inside of a VM for some deep learning. I’m was able to get 3 K80s passed through to a VM, but I noticed that performance took a big hit when working with multiple GPUs. I figured out that P2P memory isn’t working.

nvidia-smi topo -p2p r

The output of this is all CNS (Chipset not supported). So far, I’ve only managed to get all of the GPUs passed through using q35 and SeaBios. Is there any special configuration that I need to do to be able to get GPU P2P working properly?

Robert_Crovella · May 7, 2019, 3:40am

You won’t be able to. The enablement of P2P is done (in part) by inspecting the machine that it is in. If the GPU driver cannot identify the machine it is in, P2P won’t be enabled. The generic machine type that is created by most hypervisors won’t be recognized as P2P capable.

ar1stotl3 · May 7, 2019, 8:05pm

Robert, is there no way around this, or am I incorrect in assuming that this could be causing a slowdown when using multiple GPUs? I thought that multi-GPU instances in AWS were doing something similar?

I’m noticing a slowdown when putting multiple GPUs on problems like training ImageNet with MXNet, and MXNet was complaining that P2P pairs weren’t enabled so I thought it may be the culprit. What I’m trying to achieve is a machine that allows multiple groups to use the GPUs (not simultaneously) without trashing each other’s environment. So have a VM for 1 group with all 3 K80s, and when they’re done, shut down that VM and pass the K80s through to another VM.

Do you have a suggestion on a workaround or another way to achieve this?

Thanks!

Robert_Crovella · May 8, 2019, 12:37am

I didn’t say that my statements applied to AWS. And similar does not mean identical or exactly the same.

I’m not aware of any generic way to get the NVIDIA GPU driver to recognize and allow P2P in an arbitrary hypervisor/VM. I don’t know of any workarounds. Yes, I would expect that the lack of P2P could make certain multi-GPU applications run slower.

I wouldn’t attempt to do this in VMs of my own creation. If you want maximum performance, run on baremetal or containers in otherwise baremetal scenario. Or AWS or one of the other public cloud providers that offer GPUs.

There are various job schedulers and container dispatch systems that can do what you want, generally speaking. It’s not necessary to go to full HW virtualization unless you consider your user base to be aggressively rogue.