How does CUDA cores are distributed among users with individual 4/8GB GPU from Grid M10 card?

v-2shanu · September 6, 2017, 8:53am

I have windows server 2016 running on a physical server. I am purchasing a Grid M10 card. I went through the NVIDIA published sheet on how 2/4/8 GB GPU can be provided to virtual machines.

What i want to know is how CUDA cores are distributed.

For example, if i have 2 VM’s with 2GB GPU each, 1 VM with 4 GB GPU and 1 VM with 8 GB GPU, then out of all the cores on the M10 card, how are they distributed among these 4 VM’s??

Regards
Anurag

Robert_Crovella · September 6, 2017, 7:30pm

The cores are not distributed.

Each M10 has 4 GPU devices on it. For all the VMs on a particular device, the CUDA cores associated with that device are given, as a whole, to each user (with a VM on that device) as needed, in a time-sliced fashion.

v-2shanu · September 7, 2017, 12:24pm

Thanks for the response Bob. As per specification, M10 has 4 GPU’s and we have 640 cores per GPU and 8GB memory/GPU.
(https://images.nvidia.com/content/tesla/pdf/188359-Tesla-M10-DS-NV-Aug19-A4-fnl-Web.pdf)

If the cores are not distributed, then, on an M10 card, if I slice 1st 8GB on board card in two (M10-4Q) and assign 1 M10-4Q each to 2 virtual machines, since as per specification we have 640 cores on this 8GB card, you mean to say that both the virtual machines will use 640 cores in time sliced fashion?

My second question is, if user on VM1 runs application which exhaust the CUDA core resource, what is impact on user on VM2 who wants to run another graphics intensive application on VM2?

Also, I intend to use the M10 card with Hyper-v on Windows server 2016. Are there any limitation in assigning GPU/vGPU to multiple virtual machines when using M10 on Windows server 2016?

Regards
Anurag

Robert_Crovella · September 7, 2017, 2:30pm

You should ask these questions on the gridforums.

Yes, any user application, on any VM on that card, will get 640 cores in a time-sliced fashion.

Probably you are not understanding the concept of time-slicing. First the user on VM1 gets to use the 640 cores for a period of time, then the user on VM2 gets to use the 640 cores for a period of time, then the user on VM1 gets to use the 640 cores for a period of time, etc.