How to run two deep learning models in parallel on single GPU?

I need to run to two Convolutional Neural Network models in parallel on a single NVIDIA 2080Ti. But I don’t know how to do it. Can anyone help me with this problem?


You’ve purchased a Consumer grade GPU which is not capable of being virtualised. What you’re asking for cannot be done on that GPU.

In order to do what you’re asking, to provide the best choice in GPUs, you’ll need a “rack mount” server chassis. This will enable you use a Tesla V100, Quadro RTX 6000 / RTX 8000 or Tesla T4, all of which can be virtualised. Previous generation Pascal architectures are also available such as the Tesla P100 or Tesla P40. Earlier versions that support virtualisation would be Maxwell (M60) however this is unsuitable for your workload.

The V100, T4, P100 and P40 are Passively cooled, meaning that you can’t run them in a typical “tower” workstation chassis as they do not have fans. The RTX 6000 / 8000 are Actively cooled (they do have fans), so you could run them in either a tower workstation or rack mount server.

Whether you use a tower workstation or rack mount server, you’ll need a Hypervisor (VMware is best) along with its Enterprise Plus licensing (to allow GPU virtualisation) and the vGPU software from NVIDIA with Quadro vDWS licensing for each VM to enable you to share the GPU. You could use a cheaper Hypervisor such as XenServer or KVM, but you’ll lose features, functionality and usability. Basically, you get what you pay for …

For future reference, here’s how NVIDIA list their product lines:

GeForce - Consumer
Titan - Prosumer
Quadro - Professional
Tesla - Datacenter

Neither the Consumer or Prosumer lines support virtualisation. Out of Quadro, only the RTX 6000 / 8000 support virtualisation. All Tesla support virtualisation.

Honestly, it’ll be cheaper and easier for you to buy another identical workstation with 2080Ti and run them independently.