vGPU and TensorFlow

Hello guys,

I’ve got some questions that you guy may know the answer. I read the docs but I’m not sure yet.

There are some guy from the dev team that are looking for GPU for TensorFlow (AI project). We did some tests on Quadro GPU running on the working station and Dockers, but the process exhausts the GPU and make it slow for other containers that require the GPU as well.

1 - Can I run TensorFlow on vGPU profiles? The idea is to have a v100 (or other that you may recommend) shared with 2 VMs. So VM cannot exhausts the resource from the GPU, because it would have only "half" GPU. Is that possible?

2 - If not, the P40 card has 4 (four) GPUs. If I install this card on ESXi, using the passthough, can I have one GPU per VM? 4 VMs, each 1:1 GPU for P40.
2.1 - Do I need Nvidia license for this scenario?

Thinking out of the box, is there any other approach for this situation?

  1. Sure you can share the GPU with fixed or equal share to have 1/2 of the GPU for each VM

  2. P40 is a single GPU as well. You meant M10 I assume. Not the right GPU for DL

2.1) Yes you need QvDWS licensing as you use CUDA on Linux VMs.

Regards

Simon

Thanks for the reply.

So, QvDWS and V100 (for instance) I could have some VMs (at least four) running math intense application (TensorFlow / AI) workloads.

Sounds great.

Correct. The scheduler makes sure that each VM gets the assigned ressources depending of your vGPU profile size. So you can for sure also use 1/4 GPU with V100.

I’m trying to understand a little bit more about QvDWS.

Each Virtual Edition (Grid App, Grid PC or QvDWS) hava its own drivers?

Thanks.

Sorry about the insistence.

Here:

We have the following statement: Note: Unified Memory and CUDA tools are not supported on NVIDIA vGPU.

Here:

System Requirements

The GPU-enabled version of TensorFlow has the following requirements:

64-bit Linux
Python 2.7
CUDA 7.5 (CUDA 8.0 required for Pascal GPUs)

cuDNN v5.1 (cuDNN v6 if on TF v1.3)

Isn’t the cuDNN a CUDA tool? So that wouldn’t for a virtual GPU, including QvDWS.
https://developer.nvidia.com/cudnn

As TensorFlow does require CUDA SDK, I don’t think that would work.

The profile M60-8Q 1:1 is configured, so CUDA apps is enabled, as per docs bellow:

"1.6. NVIDIA vGPU Software Features

OpenCL and CUDA applications without Unified Memory are supported on these virtual GPUs:

<b>The 8Q vGPU type on Tesla M6, Tesla M10, and Tesla M60 GPUs.</b>
All Q-series vGPU types on the following GPUs:  &quot;

Am I right?

Thanks.

You are welcome to check the SS:
https://ibb.co/kHGGoK

Hi,

you are not correct. Nor the specific tools from CUDA toolkit like profiler either Unified memory is required for Tensorflow with vGPU. I’m running several VMs with Tensorflow and other frameworks using vGPU profiles.

Regards

Simon