Docker Image : cuda:10.0-cudnn7-devel-ubuntu16.04 -Training performance low - GPU Util% keep changing - Process Id section empty

ssuresh.mca · October 10, 2020, 3:48pm

Environment:
Docker Image : cuda:10.0-cudnn7-devel-ubuntu16.04
Total GPUs : 4 nos Tesla V100 - GPU Memory 16.2 GB
CUDA: 10.2
Tensorflow-gpu - 1.15
keras: 2.1.3

Current behavior

nvidia-smi shows as below - that is all GPUs Utilisation is above 90% for few seconds

Then nvidia-smi window shows as follows, that is all GPUs Utilisation is 0% for few seconds

Then nvidia-smi window shows as follows, that is Utilisation% is random in all GPUs Utilisation for few seconds

I have noticed Training performance is low and taking longer duration in Docker container mentioned here. Same code is working fine in Tesla K80 2 GPUs with CUDA 10.0 on a Dedicated server as shown below.

Screen Shot 2020-10-10 at 9.34.06 PM

But in Docker container, Why GPU utilization% is keep changing unusually and why process list section is empty? Why CUDA version is shown a 10.2 in the cuda:10.0-cudnn7-devel-ubuntu16.04 Image refer first three nvidia-smi schreen shots? I have not installed CUDA 10.2 toolkit and CuDNN libraries in the Image. How can i solve this issue?

ps aux command shows processids but nvidia-smi doesn’t show

Topic		Replies	Views
Cuda:10.0-cudnn7-devel-ubuntu16.04 - Facing Issues NGC GPU Cloud cuda , tensorflow , docker	1	1101	October 28, 2020
Docker cuda:10.0-cudnn7-devel-ubuntu16.04 - 3 out of 4 GPUs not utilized CUDA Programming and Performance	0	503	October 9, 2020
all CUDA-capable devices are busy or unavailable. What is wrong? cuDNN	10	10036	October 12, 2021
Nvidia-smi shows 0MB GPU memory utilization for docker processes CUDA Programming and Performance nvidia-smi	1	221	December 26, 2024
CUDA/Tensorflow utilization CUDA Programming and Performance	0	712	May 20, 2019
Cuda 10.2 Docker and NVIDIA Docker	0	2154	February 9, 2020
Volatile GPU-Util is always 0 CUDA Programming and Performance	1	2392	February 19, 2020
One GPU is utilized 100% and Second GPU utilization is 0% CUDA Programming and Performance cuda , tensorflow	3	1236	October 9, 2020
NVIDIA-SMI 0 utilization while training TAO Toolkit	3	1156	October 12, 2021
Nvidia Tesla V100 goes to 100% utilization and get stucked without any progress Linux cuda , nvidia-smi	0	299	October 28, 2024

Docker Image : cuda:10.0-cudnn7-devel-ubuntu16.04 -Training performance low - GPU Util% keep changing - Process Id section empty

Related topics