HOST Environment:
Host OS: Ubuntu 16.04
Docker Image : cuda:10.0-cudnn7-devel-ubuntu16.04
Total GPUs : 4 nos Tesla V100 - GPU Memory 16.2 GB
CUDA: 10.2
Driver : 440.65
Tensorflow-gpu - 1.15
keras: 2.1.3
I have run the container using the Docker Image : cuda:10.0-cudnn7-devel-ubuntu16.04 on the Third party cloud host.
When i invoke nvidia-smi inside the container,
nvidia-smi shows as below - that is all GPUs Utilisation is above 90% for few seconds
Then nvidia-smi window shows as follows, that is all GPUs Utilisation is 0% for few seconds
Then nvidia-smi window shows as follows, that is Utilisation% is random in all GPUs Utilisation for few seconds
I have noticed Training performance is low and taking longer duration in Docker container mentioned here. Same code is working fine in Tesla K80 2 GPUs with CUDA 10.0 on a Dedicated server as shown below.
But in Docker container, Why GPU utilization% is keep changing unusually
Expected:
Question 1: i am seeing the CUDA 10.2 version with driver version 440.65.00 instead of CUDA 10.0, Why?
Question 2: There is no process list even though GPU Util % is above 90% in all 4 GPUs, Why?
Question 3: Why GPU utilization% is keep changing unusually?
Question 4: My code will work on tensorflow-gpu ==1.15.0 and keras==2.13 i can’t change to CUDA 10.2 based Ubuntu image
How could i solve this issue with the container having the Docker image cuda:10.0-cudnn7-devel-ubuntu16.04?
Question 5: Could i install CUDA 10.0 explicitly insider the container running Docker image cuda:10.0-cudnn7-devel-ubuntu16.04?
Question 6: Could i install CUDA driver between >= 384.111, < 385.00 explicitly insider the container running Docker image cuda:10.0-cudnn7-devel-ubuntu16.04?