TAO Toolkit 5.5.0 - cuInit failed: no CUDA-capable device is detected

Please provide the following information when requesting support.

• Hardware: NVIDIA GeForce RTX 4090
• Network Type: Classification
• TLT Version: TAO 5.5.0
• Training spec file: Default from Classification_tf1

Hi,
I’m using TAO Toolkit version 5.5.0, installed following the Quick Start Guide. However, I’m encountering the following error when running this command:

!tao model classification_tf1 train -e $SPECS_DIR/classification_spec.cfg -r $USER_EXPERIMENT_DIR/output -k $KEY

The error message is:

cuda.init()  
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected  
2025-01-09 08:50:30,564 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

I’ve reviewed similar issues discussed in these threads:

  1. No CUDA-capable device is detected - yolov4
  2. No CUDA-capable device is detected on tao detectnet_v2 dataset convert

Despite applying the suggestions there, the error persists.

Here is my system setup:
nvidia-smi

At first, my driver version was 550.120 with CUDA version 12.4. As suggested in the previous forums, I downgraded to driver version 535.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0  On |                  Off |
|  0%   39C    P8              24W / 450W |    731MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2245      G   /usr/lib/xorg/Xorg                          240MiB |
|    0   N/A  N/A      2390      G   /usr/bin/gnome-shell                         80MiB |
|    0   N/A  N/A      4322      G   ...erProcess --variations-seed-version       49MiB |
|    0   N/A  N/A      4526      G   ...irefox/5437/usr/lib/firefox/firefox      338MiB |
+---------------------------------------------------------------------------------------+

CUDA packages

dpkg -l | grep cuda
ii  libcudart11.0:amd64                        11.5.117~11.5.1-1ubuntu1                amd64        NVIDIA CUDA Runtime Library
ii  nvidia-cuda-dev:amd64                      11.5.1-1ubuntu1                         amd64        NVIDIA CUDA development files
ii  nvidia-cuda-gdb                            11.5.114~11.5.1-1ubuntu1                amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                        11.5.1-1ubuntu1                         amd64        NVIDIA CUDA development toolkit
ii  nvidia-cuda-toolkit-doc                    11.5.1-1ubuntu1                         all          NVIDIA CUDA and OpenCL documentation

Can someone guide me on resolving this issue?
Thanks in advance!

Please run something narrow down.
Command is in pycuda._driver.LogicError: cuInit failed: system not yet initialized - #8 by Morganh

Here’s the result within the nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 container:

python
Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycuda
>>> import pycuda.driver as cuda
>>> cuda.init()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected

# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

I also installed nvidia-modprobe, but the issue persists.

Just to let you know, I restored the snapshot, here’s the current output of nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
|  0%   39C    P8             25W /  450W |     453MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2244      G   /usr/lib/xorg/Xorg                            158MiB |
|    0   N/A  N/A      2390      G   /usr/bin/gnome-shell                           79MiB |
|    0   N/A  N/A      3606      G   ...irefox/5437/usr/lib/firefox/firefox        158MiB |
|    0   N/A  N/A      5781      G   ...erProcess --variations-seed-version         33MiB |
+-----------------------------------------------------------------------------------------+

Please reinstall driver, then reboot to retry.
sudo apt purge nvidia-driver-550
sudo apt autoremove
sudo apt autoclean
sudo apt install nvidia-driver-550

Reboot and retry.

I redo the entire installation process and the cuInit failed error goes away, but there are other errors:

   raise DockerException(
docker.errors.DockerException: Error while fetching server API version: Not supported URL scheme http+docker
   raise DockerException(
docker.errors.DockerException: Error while fetching server API version: HTTPConnection.request() got an unexpected keyword argument 'chunked'

I fixed these error by forcing the versions of these two packages requests & urllib3:

# in the same virtual environment that I install the TAO Launcher

pip install --force-reinstall 'requests<2.29.0' 'urllib3<2.0'

But I got this ERROR:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-server 2.14.1 requires websocket-client>=1.7, but you have websocket-client 0.57.0 which is incompatible.
jupyterlab-server 2.27.3 requires requests>=2.31, but you have requests 2.28.2 which is incompatible.
nvidia-tao 5.5.1 requires idna==2.10, but you have idna 3.10 which is incompatible.
nvidia-tao 5.5.1 requires requests==2.31.0, but you have requests 2.28.2 which is incompatible.

Despite the error message, I can still run TAO. But does forcing the versions of requests & urllib3 packages affect the performance of TAO?

Here are the versions of related software dependencies:

  • Ubuntu 22.04
  • nvidia-driver version 550
  • Docker version 27.4.1
  • NVIDIA Container Runtime Hook version 1.17.3
  • conda 24.11.1
  • python 3.10
  • nvidia-tao 5.5.1

No, it will not affect TAO. You can find similar topic and solution in Tao toolkit observations - #63 by foreverneilyoung.