Cuda:10.0-cudnn7-devel-ubuntu16.04 - Facing Issues

ssuresh.mca · October 12, 2020, 2:50pm

HOST Environment:
Host OS: Ubuntu 16.04
Docker Image : cuda:10.0-cudnn7-devel-ubuntu16.04
Total GPUs : 4 nos Tesla V100 - GPU Memory 16.2 GB
CUDA: 10.2
Driver : 440.65
Tensorflow-gpu - 1.15
keras: 2.1.3

I have run the container using the Docker Image : cuda:10.0-cudnn7-devel-ubuntu16.04 on the Third party cloud host.

When i invoke nvidia-smi inside the container,

nvidia-smi shows as below - that is all GPUs Utilisation is above 90% for few seconds

Then nvidia-smi window shows as follows, that is all GPUs Utilisation is 0% for few seconds

Then nvidia-smi window shows as follows, that is Utilisation% is random in all GPUs Utilisation for few seconds

I have noticed Training performance is low and taking longer duration in Docker container mentioned here. Same code is working fine in Tesla K80 2 GPUs with CUDA 10.0 on a Dedicated server as shown below.

Screen Shot 2020-10-10 at 9 34 06 PM

But in Docker container, Why GPU utilization% is keep changing unusually

Expected:

Question 1: i am seeing the CUDA 10.2 version with driver version 440.65.00 instead of CUDA 10.0, Why?

Question 2: There is no process list even though GPU Util % is above 90% in all 4 GPUs, Why?

Question 3: Why GPU utilization% is keep changing unusually?

Question 4: My code will work on tensorflow-gpu ==1.15.0 and keras==2.13 i can’t change to CUDA 10.2 based Ubuntu image
How could i solve this issue with the container having the Docker image cuda:10.0-cudnn7-devel-ubuntu16.04?

Question 5: Could i install CUDA 10.0 explicitly insider the container running Docker image cuda:10.0-cudnn7-devel-ubuntu16.04?

Question 6: Could i install CUDA driver between >= 384.111, < 385.00 explicitly insider the container running Docker image cuda:10.0-cudnn7-devel-ubuntu16.04?

P_Ramarao · October 28, 2020, 9:40pm

Hi there,

I’m providing some answers here:

The version printed in the nvidia-smi output is the maximum version of CUDA supported by the driver. In this case, R440 was released along side CUDA 10.2. R440 is backwards compatible against all lower CUDA releases.
What were you running (before) when you observed the 90%+ utilization output in nvidia-smi? The utilization output from nvidia-smi is not cycle accurate.
This is related to Q2
I’m not sure what you mean. Can you please rephrase? As I mentioned, R440 will support all older CUDA versions including CUDA 10.0
Yes you can - but what is the use-case and what do you want to achieve?
No please don’t install drivers inside containers. This makes the container non-portable and defeats the purpose of containerization in the first place. Why do you want to do this?

Topic		Replies	Views
Docker Image : cuda:10.0-cudnn7-devel-ubuntu16.04 -Training performance low - GPU Util% keep changing - Process Id section empty CUDA Programming and Performance cuda , tensorflow	0	1358	October 10, 2020
Docker and nvidia-smi not working with clean install on Driver 470.14 and Insider Preview (Build 21343) Ubuntu 20.04 CUDA on Windows Subsystem for Linux	3	5629	April 17, 2021
Docker container cant use GPU cuDNN tensorflow , docker , python , gpu	1	4504	July 1, 2022
Can't get cuda:10.0 docker container to run with tensorflow-gpu Frameworks tensorflow	3	1435	March 4, 2020
Applications not using GPU inside docker container Docker and NVIDIA Docker	1	1288	May 2, 2024
all CUDA-capable devices are busy or unavailable. What is wrong? cuDNN	10	9647	October 12, 2021
Cuda 11.4.2 docker image driver version mismatch CUDA Setup and Installation	2	4294	January 7, 2022
Tensorflow docker can't detect gpu Docker and NVIDIA Docker cuda , tensorflow , docker	0	3953	December 31, 2020
Running Cuda on Docker CUDA Setup and Installation	7	17343	May 23, 2016
Docker cuda:10.0-cudnn7-devel-ubuntu16.04 - 3 out of 4 GPUs not utilized CUDA Programming and Performance	0	459	October 9, 2020

Cuda:10.0-cudnn7-devel-ubuntu16.04 - Facing Issues

Related topics