Docker container cant use GPU

user113637 · June 24, 2022, 10:18pm

Hello I have a pc that can use gpu, but I couldn’t run a docker there that use the gpu. How can I run gpu using containers?

Dockerfile:
FROM nvcr.io/nvidia/tensorflow:21.12-tf2-py3

Comands to build Image and run container:
sudo docker build --tag=solonvidiatensorflow:latest .
sudo docker run --tty --detach --name container_nvidiatensorflow solonvidiatensorflow:latest

If I run:

a)
a.1)
nvidia-smi:

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1705 G /usr/lib/xorg/Xorg 175MiB |
| 0 N/A N/A 1838 G /usr/bin/gnome-shell 27MiB |
| 0 N/A N/A 2435 G …AAAAAAAAA= --shared-files 71MiB |
±----------------------------------------------------------------------------+

a.2)
sudo docker exec container_nvidiatensorflow nvidia-smi:
OCI runtime exec failed: exec failed: unable to start container process: exec: “nvidia-smi”: executable file not found in $PATH: unknown

b)
b.1)
python3 -c “import tensorflow as tf ; print('Num GPUs Available: ', len(tf.config.list_physical_devices(‘GPU’)))”:

2022-06-24 17:11:33.019941: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-24 17:11:33.038839: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-24 17:11:33.038983: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Num GPUs Available: 1

b.2)
sudo docker exec container_nvidiatensorflow python3 -c “import tensorflow as tf ; print('Num GPUs Available: ', len(tf.config.list_physical_devices(‘GPU’)))”:

2022-06-24 21:21:30.820790: W tensorflow/stream_executor/platform/default/dso_loader.cc:65] Could not load dynamic library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-06-24 21:21:30.820806: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-06-24 21:21:30.820816: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
Num GPUs Available: 0

c)
c.1)
python3 -c “import tensorflow as tf ; print(tf.test.gpu_device_name())”

2022-06-24 17:14:26.986965: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-24 17:14:27.016277: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-24 17:14:27.035709: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-24 17:14:27.035852: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-24 17:14:27.307803: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-24 17:14:27.307963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-24 17:14:27.308069: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-24 17:14:27.308173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 4948 MB memory: → device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
/device:GPU:0

c.2)
sudo docker exec container_nvidiatensorflow python3 -c “import tensorflow as tf ; print(tf.test.gpu_device_name())”:

2022-06-24 21:22:29.321447: W tensorflow/stream_executor/platform/default/dso_loader.cc:65] Could not load dynamic library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-06-24 21:22:29.321463: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-06-24 21:22:29.321477: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist

If you need further information just ask me
Thanks in advance!

spolisetty · July 1, 2022, 9:32am

Hi,

Looks like you’re missing the --gpus all option in the docker command. Also, we recommend you to please use the latest container.
This forum talks more about updates and issues related to cuDNN. We recommend you to please reach out Nvidia container related platform to get better help.

Thank you.

Topic		Replies	Views
all CUDA-capable devices are busy or unavailable. What is wrong? cuDNN	10	9608	October 12, 2021
Tensorflow docker can't detect gpu Docker and NVIDIA Docker cuda , tensorflow , docker	0	3947	December 31, 2020
Tensorflow coredump no supported devices found for CUDA (Docker nvcr.io container), after reboot nvidia-smi can't find driver Linux cuda , tensorflow	2	2577	October 8, 2020
Failure to call to cuInit in nvidia-docker2 Container: CUDA ubuntu , docker	2	2097	August 18, 2023
Running tensorflow without AVX on two xeon X5670 CUDA on Windows Subsystem for Linux	0	994	July 5, 2020
PyTorch utilize CPU instead of GPU CUDA on Windows Subsystem for Linux	5	2841	November 25, 2020
Running docker-compose failing in GPU detection CUDA on Windows Subsystem for Linux	3	8891	October 12, 2021
Dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; cuDNN	10	7466	October 12, 2021
Unable to run TensorFlow with vGPU General Discussion	2	5253	March 9, 2020
Can't get cuda:10.0 docker container to run with tensorflow-gpu Frameworks tensorflow	3	1434	March 4, 2020

Docker container cant use GPU

Related topics