I am using NVIDIA container “nvcr.io/nvidia/tensorflow:20.12-tf2-py3”.
The container gets started with:
docker run --gpus all -it --rm ${USER} -v $HOME:/home -w /home -p 8888:8888 -p 5000:5000 nvcr.io/nvidia/tensorflow:20.12-tf2-py3 bash
Doing nvidia-smi gives this:
Mon Dec 28 14:57:34 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.04 Driver Version: 455.23.04 CUDA Version: 11.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:21:00.0 Off | N/A |
| 30% 19C P8 11W / 350W | 5MiB / 24268MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
±----------------------------------------------------------------------------+
Starting python and importing TensorFlow works well.
import tensorflow as tf
But listing the GPU devices fails like this:
tf.config.list_physical_devices(‘GPU’)
2020-12-28 14:59:21.742965: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-12-28 14:59:21.780068: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2020-12-28 14:59:21.780107: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: a23697126247
2020-12-28 14:59:21.780116: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: a23697126247
2020-12-28 14:59:21.780238: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 455.23.4
2020-12-28 14:59:21.780266: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 455.23.4
2020-12-28 14:59:21.780277: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 455.23.4
Question: How can this NVIDIA container make TensorFlow to see the RTX 3090 GPU?