Docker Tensorflow-gpu can't find device, as well as nvidia-smi "No device found"

davidhervil · October 22, 2019, 6:53pm

Hi, I’ve acquired access to an AWS machine with a Tesla T4 GPU for machine learning, and after installing drivers necessary for the TensorFlow library, I’ve run with the next issue when trying to execute the tensorflow-gpu ready docker image:

docker: Error response from daemon: OCI runtime create failed: 
container_linux.go:346: starting container process caused "process_linux.go:449: container init caused
\"process_linux.go:432: running prestart hook 0 caused 
\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\n\\"\"": unknown.

Installed nvidia driver is the 418, and the docker version on the server is 19.03.4.

Running the nvidia-smi command yeilds

$ nvidia-smi
No devices were found

There was no error downloading or installing the drivers. The GPU appears on the lspci command. I’ve tried many of the attempted solutions I’ve found on these forums from people with similar problems, with no results.

Ran the nvidia-bug-report.sh (log attached to post) and from what I’ve seen there is the

Oct 22 17:35:21 kernel: NVRM: GPU 0000:00:1e.0: RmInitAdapter failed! (0x26:0xffff:1155)

error which could be a hardware issue ? Is that even possible for a delivered AWS machine ?

Thank you
nvidia-bug-report.log.gz (508 KB)

davidhervil · October 22, 2019, 8:58pm

Solved the issue. Some how the headers and default driver config for the AWS machine are faulty or incompatible. So had to make a flushed reinstall.

Here are the steps I took:

Completely purge everything from nvidia and cuda:

dpkg -l | grep -i nvidia

To list all your cuda packages

dpkg -l | grep -i cuda

To purge all nvidia and cuda packages

sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^cuda.*'

IMPORTANT: if you are in a desktop environment you need to reinstall nvidia-commons and ubuntu-desktop, as well as some nouveau resetting, since those are necessary to use your monitor and log in with UI. If you are in a monitorless environment, this is not required.
To have more details on this check https://askubuntu.com/questions/206283/how-can-i-uninstall-a-nvidia-driver-completely

Reinstall nvidia drivers using the drivers ppa:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt upgrade
sudo apt install nvidia-driver-<VERSION THAT YOU NEED>
sudo reboot

In my case, the 418 was the version I needed for my GPU. Nonetheless the driver ppa already knows what’s best for you, and that version number may be different from the one that nvidia-smi yeilds after installation.

After that, I reinstalled all the tools I needed for work and they worked perfectly

Topic		Replies	Views
Tensorflow coredump no supported devices found for CUDA (Docker nvcr.io container), after reboot nvidia-smi can't find driver Linux cuda , tensorflow	2	2589	October 8, 2020
Nvidia-Smi shows "No devices were found" CUDA Setup and Installation cuda , ubuntu , gpu	3	1449	August 12, 2021
No Devices Found - When running nvidia-smi on Ubuntu 22.04 Driver 530 Linux	0	479	May 24, 2023
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. cuDNN	1	3189	November 30, 2019
Nvidia-smi no devices were found after a while of working properly General ubuntu	0	655	October 6, 2023
could not select device driver "" with capabilities: [[gpu]]. Docker and NVIDIA Docker	13	216148	July 3, 2025
Nvidia-smi : no devices were found Linux	0	505	July 9, 2023
No devices were found - nvidia-smi Linux	0	393	June 10, 2023
nvidia-smi gives "No devices were found" on Ubuntu 18.04 with 2080 Ti Linux	4	1574	November 8, 2020
Ubuntu 20.04 nvidia-smi “No devices were found” Linux	0	1814	April 28, 2023

Docker Tensorflow-gpu can't find device, as well as nvidia-smi "No device found"

Related topics