Tensorflow docker can't detect gpu

miguel.an.gon.ro · December 31, 2020, 7:07pm

I am trying to run the tensorflow:20.12-tf2-py3 container with my RTX 3070. If I run nvidia-smi in the nvidia/cuda docker:

docker run --privileged --gpus all --rm nvidia/cuda:11.1-base nvidia-smi

it works well, with an output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3070    Off  | 00000000:42:00.0  On |                  N/A |
|  0%   45C    P8    23W / 220W |    695MiB /  7981MiB |     26%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

However, when I run the tensorflow docker:

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:20.12-tf2-py3

I get the message:

================
== TensorFlow ==
================

NVIDIA Release 20.12-tf2 (build 18110405)
TensorFlow Version 2.3.1

Container image Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
Copyright 2017-2020 The TensorFlow Authors.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: No supported GPU(s) detected to run this container

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for TensorFlow.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

and if I run nvidia-smi inside this container, I get:

Failed to initialize NVML: Unknown Error

and running

nvcc --version

yields:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

so this container is obviously compatible with CUDA 11.1.

What can be the reason for this container not to detect my GPU?

P.S.: I am using Fedora 33.

EDIT:

I solved it. The problem was some kind of permission issue that was not notified, so it was solved after I added the option --priviledged when running the docker.

Topic		Replies	Views
Unable to run TensorFlow with vGPU General Discussion	2	5236	March 9, 2020
Failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error Container: CUDA	0	2827	December 28, 2020
Docker container cant use GPU cuDNN tensorflow , docker , python , gpu	1	4310	July 1, 2022
Running docker-compose failing in GPU detection CUDA on Windows Subsystem for Linux	3	8597	October 12, 2021
Docker and nvidia-smi not working with clean install on Driver 470.14 and Insider Preview (Build 21343) Ubuntu 20.04 CUDA on Windows Subsystem for Linux	3	5443	April 17, 2021
Failure to call to cuInit in nvidia-docker2 Container: CUDA ubuntu , docker	2	1915	August 18, 2023
Tensor Core Usage on WSL2 with RTX 3080 Laptop GPU CUDA on Windows Subsystem for Linux	1	2147	February 27, 2022
Tensorflow is not recognising the gpu TensorRT	7	1138	July 15, 2024
all CUDA-capable devices are busy or unavailable. What is wrong? cuDNN	10	8828	October 12, 2021
Docker Tensorflow-gpu can't find device, as well as nvidia-smi "No device found" Linux	2	4066	October 12, 2021

Tensorflow docker can't detect gpu

EDIT:

Related Topics