vGPU and TensorFlow

morphews · August 10, 2018, 6:15pm

Hello guys,

I’ve got some questions that you guy may know the answer. I read the docs but I’m not sure yet.

There are some guy from the dev team that are looking for GPU for TensorFlow (AI project). We did some tests on Quadro GPU running on the working station and Dockers, but the process exhausts the GPU and make it slow for other containers that require the GPU as well.

1 - Can I run TensorFlow on vGPU profiles? The idea is to have a v100 (or other that you may recommend) shared with 2 VMs. So VM cannot exhausts the resource from the GPU, because it would have only "half" GPU. Is that possible?

2 - If not, the P40 card has 4 (four) GPUs. If I install this card on ESXi, using the passthough, can I have one GPU per VM? 4 VMs, each 1:1 GPU for P40.
2.1 - Do I need Nvidia license for this scenario?

Thinking out of the box, is there any other approach for this situation?

sschaber · August 11, 2018, 8:08am

Sure you can share the GPU with fixed or equal share to have 1/2 of the GPU for each VM
P40 is a single GPU as well. You meant M10 I assume. Not the right GPU for DL

2.1) Yes you need QvDWS licensing as you use CUDA on Linux VMs.

Regards

Simon

morphews · August 14, 2018, 2:11pm

Thanks for the reply.

So, QvDWS and V100 (for instance) I could have some VMs (at least four) running math intense application (TensorFlow / AI) workloads.

Sounds great.

sschaber · August 15, 2018, 2:18pm

Correct. The scheduler makes sure that each VM gets the assigned ressources depending of your vGPU profile size. So you can for sure also use 1/4 GPU with V100.

morphews · August 17, 2018, 1:03pm

I’m trying to understand a little bit more about QvDWS.

Each Virtual Edition (Grid App, Grid PC or QvDWS) hava its own drivers?

Thanks.

morphews · August 19, 2018, 10:25pm

Sorry about the insistence.

Here:

We have the following statement: Note: Unified Memory and CUDA tools are not supported on NVIDIA vGPU.

Here:

System Requirements

The GPU-enabled version of TensorFlow has the following requirements:

64-bit Linux
Python 2.7
CUDA 7.5 (CUDA 8.0 required for Pascal GPUs)

cuDNN v5.1 (cuDNN v6 if on TF v1.3)

Isn’t the cuDNN a CUDA tool? So that wouldn’t for a virtual GPU, including QvDWS.
https://developer.nvidia.com/cudnn

morphews · August 19, 2018, 11:27pm

As TensorFlow does require CUDA SDK, I don’t think that would work.

The profile M60-8Q 1:1 is configured, so CUDA apps is enabled, as per docs bellow:

"1.6. NVIDIA vGPU Software Features

OpenCL and CUDA applications without Unified Memory are supported on these virtual GPUs:

<b>The 8Q vGPU type on Tesla M6, Tesla M10, and Tesla M60 GPUs.</b>
All Q-series vGPU types on the following GPUs:  &quot;

Am I right?

Thanks.

You are welcome to check the SS:
https://ibb.co/kHGGoK

sschaber · August 24, 2018, 9:35am

Hi,

you are not correct. Nor the specific tools from CUDA toolkit like profiler either Unified memory is required for Tensorflow with vGPU. I’m running several VMs with Tensorflow and other frameworks using vGPU profiles.

Regards

Simon

Topic		Replies	Views
vGPU: one V100, 2 VMs using CUDA at the same time. Is it possible? NVIDIA Virtual GPU Technology	1	2835	October 4, 2019
Looking for driver for P100 to run tensorflow in ESX 6.5 NVIDIA Virtual GPU Drivers	2	5961	October 18, 2017
CUDA Applications from 2 VMs General Discussion	5	2778	February 13, 2019
Full profiles k180q/k280q NVIDIA Virtual GPU Drivers	2	24782	March 16, 2016
Vmware vGPU v100 Can't run a few profiles together General Discussion horizon_vsga	4	4164	May 27, 2020
any virtual machine that supports cuda? CUDA Programming and Performance	4	6931	March 6, 2010
Tesla V100 16 GB currently on Hyper-V with DDA - Any advantages using Quadro vDWS? General Discussion	6	5987	September 6, 2019
Tesla M10 GPU profile general question NVIDIA Virtual GPU Technology	4	14351	June 23, 2017
XenApp & Nvidia Tesla M10 profiles XenApp	5	13331	August 16, 2017
How are GPU cores allocated for each vGPU Profile? NVIDIA Virtual GPU Technology	6	18750	September 5, 2014

vGPU and TensorFlow

Related topics