I am trying to run the tensorflow:20.12-tf2-py3 container with my RTX 3070. If I run nvidia-smi in the nvidia/cuda docker:
docker run --privileged --gpus all --rm nvidia/cuda:11.1-base nvidia-smi
it works well, with an output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3070 Off | 00000000:42:00.0 On | N/A |
| 0% 45C P8 23W / 220W | 695MiB / 7981MiB | 26% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
However, when I run the tensorflow docker:
docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:20.12-tf2-py3
I get the message:
================
== TensorFlow ==
================
NVIDIA Release 20.12-tf2 (build 18110405)
TensorFlow Version 2.3.1
Container image Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
Copyright 2017-2020 The TensorFlow Authors. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying project or file.
ERROR: No supported GPU(s) detected to run this container
NOTE: MOFED driver for multi-node communication was not detected.
Multi-node communication performance may be reduced.
NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for TensorFlow. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...
and if I run nvidia-smi inside this container, I get:
Failed to initialize NVML: Unknown Error
and running
nvcc --version
yields:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
so this container is obviously compatible with CUDA 11.1.
What can be the reason for this container not to detect my GPU?
P.S.: I am using Fedora 33.
EDIT:
I solved it. The problem was some kind of permission issue that was not notified, so it was solved after I added the option --priviledged when running the docker.