Please provide the following information when requesting support.
• Hardware: NVIDIA GeForce RTX 4090
• Network Type: Classification
• TLT Version: TAO 5.5.0
• Training spec file: Default from Classification_tf1
Hi,
I’m using TAO Toolkit version 5.5.0, installed following the Quick Start Guide. However, I’m encountering the following error when running this command:
!tao model classification_tf1 train -e $SPECS_DIR/classification_spec.cfg -r $USER_EXPERIMENT_DIR/output -k $KEY
The error message is:
cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected
2025-01-09 08:50:30,564 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.
I’ve reviewed similar issues discussed in these threads:
- No CUDA-capable device is detected - yolov4
- No CUDA-capable device is detected on tao detectnet_v2 dataset convert
Despite applying the suggestions there, the error persists.
Here is my system setup:
nvidia-smi
At first, my driver version was 550.120 with CUDA version 12.4. As suggested in the previous forums, I downgraded to driver version 535.
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 On | Off |
| 0% 39C P8 24W / 450W | 731MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2245 G /usr/lib/xorg/Xorg 240MiB |
| 0 N/A N/A 2390 G /usr/bin/gnome-shell 80MiB |
| 0 N/A N/A 4322 G ...erProcess --variations-seed-version 49MiB |
| 0 N/A N/A 4526 G ...irefox/5437/usr/lib/firefox/firefox 338MiB |
+---------------------------------------------------------------------------------------+
CUDA packages
dpkg -l | grep cuda
ii libcudart11.0:amd64 11.5.117~11.5.1-1ubuntu1 amd64 NVIDIA CUDA Runtime Library
ii nvidia-cuda-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-gdb 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
ii nvidia-cuda-toolkit 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.5.1-1ubuntu1 all NVIDIA CUDA and OpenCL documentation
Can someone guide me on resolving this issue?
Thanks in advance!