No CUDA-capable device is detected

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type :Yolo_v4
• TLT Version Configuration of the TAO Toolkit Instance
task_group: [‘model’, ‘dataset’, ‘deploy’]
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024

• Training spec file(If have, please share here)
• How to reproduce the issue ?
Hi, to add some context, i’m editing the yolov4 notebook with custom datas from the tao_launcher_starter_kit. I’ve some trouble with cell before 2.3 : # If you use your own dataset, you will need to run the code below to generate the best anchor shape

!tao model yolo_v4 kmeans
-l $DATA_DOWNLOAD_DIR/kitti_split/training/label
-i $DATA_DOWNLOAD_DIR/kitti_split/training/image
-n 9
-x 960
-y 544 \

-e nvcr.io/nvidia/tao/tao-toolkit:v5.5.0

The anchor shape generated by this script is sorted. Write the first 3 into small_anchor_shape in the config

file. Write middle 3 into mid_anchor_shape. Write last 3 into big_anchor_shape.

I get this error :
2025-02-14 09:32:17,859 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2025-02-14 09:32:17,952 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2025-02-14 09:32:18,066 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 292:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/pcia2/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2025-02-14 09:32:18,066 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 305: Printing tty value True
Using TensorFlow backend.
2025-02-14 08:32:19.391648: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2025-02-14 08:32:19,497 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2025-02-14 08:32:20,541 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.
2025-02-14 08:32:20,573 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.
2025-02-14 08:32:20,580 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.
2025-02-14 08:32:21,874 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
2025-02-14 08:32:22,247 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.keras_exporter 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
Traceback (most recent call last):
File “/usr/local/bin/yolo_v4”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/entrypoint/yolo_v4.py”, line 12, in main
launch_job(nvidia_tao_tf1.cv.yolo_v4.scripts, “yolo_v4”, sys.argv[1:])
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py”, line 276, in launch_job
modules = get_modules(package)
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py”, line 47, in get_modules
module = importlib.import_module(module_name)
File “/usr/lib/python3.8/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1014, in _gcd_import
File “”, line 991, in _find_and_load
File “”, line 975, in _find_and_load_unlocked
File “”, line 671, in _load_unlocked
File “”, line 848, in exec_module
File “”, line 219, in _call_with_frames_removed
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/export.py”, line 21, in
from nvidia_tao_tf1.cv.yolo_v4.export.yolov4_exporter import YOLOv4Exporter as Exporter
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/export/yolov4_exporter.py”, line 42, in
from nvidia_tao_tf1.cv.common.export.keras_exporter import KerasExporter as Exporter
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/keras_exporter.py”, line 46, in
from nvidia_tao_tf1.core.export.app import get_model_input_dtype
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/export/app.py”, line 40, in
from nvidia_tao_tf1.core.export._tensorrt import keras_to_tensorrt
File “/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/export/_tensorrt.py”, line 39, in
import pycuda.autoinit # noqa pylint: disable=W0611
File “/usr/local/lib/python3.8/dist-packages/pycuda/autoinit.py”, line 5, in
cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected
2025-02-14 09:32:22,769 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 367: Stopping container.

nvidia-smi
NVIDIA-SMI 535.230.02 Driver Version: 535.230.02 CUDA Version: 12.2

I also tried to follow this topic since he seemed to have similar issues, however, i still have errors :
No CUDA-capable device is detected - yolov4 - #8 by ahaselhan

I tried :
> Could you open a terminal in the VM and run below ?
edited the link since i’m limited as a new user, look the post above to get it
> Then, run python .
*> *
> * *> #python* *> >>> import pycuda* *> >>> import pycuda.driver as cuda* *> >>> cuda.init()* *>

and :
Seems that no gpu is found.
Can you reboot it and retry?
More, can you try another docker?
edited the link since i’m limited as a new user, look the post above to get it

And i’m having the exact same result as him.

I would greatly appreciate any advice on how to proceed with this issue. I apologize in advance, but I’m quite new here, so what may seem simple or obvious to others might not be clear to me. If you need any additional information, please feel free to ask. I am currently running Ubuntu 22.04.5 LTS
,
Thanks,
Valentin

Please reinstall driver and reboot. Thanks.
Refer to TA0 v3.21.08 - pycuda._driver.LogicError: cuInit failed: system not yet initialized - #3 by Morganh.

I tried reinstalling driver and rebooting as mentioned but still having the same error. Also tried what was said in the post :
$ python

* *>>>import pycuda* *>>> import pycuda.driver as cuda* *>>> cuda.init()* *
that showed me nothing.

and
$ nvidia-smi
$ dpkg -l |grep cuda

ii  cuda-cccl-12-1                             12.1.109-1                                        amd64        CUDA CCCL
ii  cuda-cccl-12-2                             12.2.140-1                                        amd64        CUDA CCCL
ii  cuda-command-line-tools-12-2               12.2.2-1                                          amd64        CUDA command-line tools
ii  cuda-compiler-12-2                         12.2.2-1                                          amd64        CUDA compiler
ii  cuda-crt-12-2                              12.2.140-1                                        amd64        CUDA crt
ii  cuda-cudart-12-1                           12.1.105-1                                        amd64        CUDA Runtime native Libraries
ii  cuda-cudart-12-2                           12.2.140-1                                        amd64        CUDA Runtime native Libraries
ii  cuda-cudart-dev-12-1                       12.1.105-1                                        amd64        CUDA Runtime native dev links, headers
ii  cuda-cudart-dev-12-2                       12.2.140-1                                        amd64        CUDA Runtime native dev links, headers
ii  cuda-cuobjdump-12-2                        12.2.140-1                                        amd64        CUDA cuobjdump
ii  cuda-cupti-12-2                            12.2.142-1                                        amd64        CUDA profiling tools runtime libs.
ii  cuda-cupti-dev-12-2                        12.2.142-1                                        amd64        CUDA profiling tools interface.
ii  cuda-cuxxfilt-12-2                         12.2.140-1                                        amd64        CUDA cuxxfilt
ii  cuda-documentation-12-2                    12.2.140-1                                        amd64        CUDA documentation
ii  cuda-driver-dev-12-1                       12.1.105-1                                        amd64        CUDA Driver native dev stub library
ii  cuda-driver-dev-12-2                       12.2.140-1                                        amd64        CUDA Driver native dev stub library
ii  cuda-gdb-12-2                              12.2.140-1                                        amd64        CUDA-GDB
ii  cuda-libraries-12-2                        12.2.2-1                                          amd64        CUDA Libraries 12.2 meta-package
ii  cuda-libraries-dev-12-2                    12.2.2-1                                          amd64        CUDA Libraries 12.2 development meta-package
ii  cuda-nsight-12-2                           12.2.144-1                                        amd64        CUDA nsight
ii  cuda-nsight-compute-12-2                   12.2.2-1                                          amd64        NVIDIA Nsight Compute
ii  cuda-nsight-systems-12-2                   12.2.2-1                                          amd64        NVIDIA Nsight Systems
ii  cuda-nvcc-12-1                             12.1.105-1                                        amd64        CUDA nvcc
ii  cuda-nvcc-12-2                             12.2.140-1                                        amd64        CUDA nvcc
ii  cuda-nvdisasm-12-2                         12.2.140-1                                        amd64        CUDA disassembler
ii  cuda-nvml-dev-12-2                         12.2.140-1                                        amd64        NVML native dev links, headers
ii  cuda-nvprof-12-2                           12.2.142-1                                        amd64        CUDA Profiler tools
ii  cuda-nvprune-12-2                          12.2.140-1                                        amd64        CUDA nvprune
ii  cuda-nvrtc-12-2                            12.2.140-1                                        amd64        NVRTC native runtime libraries
ii  cuda-nvrtc-dev-12-2                        12.2.140-1                                        amd64        NVRTC native dev links, headers
ii  cuda-nvtx-12-2                             12.2.140-1                                        amd64        NVIDIA Tools Extension
ii  cuda-nvvm-12-2                             12.2.140-1                                        amd64        CUDA nvvm
ii  cuda-nvvp-12-2                             12.2.142-1                                        amd64        CUDA Profiler tools
ii  cuda-opencl-12-2                           12.2.140-1                                        amd64        CUDA OpenCL native Libraries
ii  cuda-opencl-dev-12-2                       12.2.140-1                                        amd64        CUDA OpenCL native dev links, headers
ii  cuda-profiler-api-12-2                     12.2.140-1                                        amd64        CUDA Profiler API
ii  cuda-sanitizer-12-2                        12.2.140-1                                        amd64        CUDA Sanitizer
ii  cuda-toolkit-12-1-config-common            12.1.105-1                                        all          Common config package for CUDA Toolkit 12.1.
ii  cuda-toolkit-12-2                          12.2.2-1                                          amd64        CUDA Toolkit 12.2 meta-package
ii  cuda-toolkit-12-2-config-common            12.2.140-1                                        all          Common config package for CUDA Toolkit 12.2.
ii  cuda-toolkit-12-config-common              12.8.57-1                                         all          Common config package for CUDA Toolkit 12.
ii  cuda-toolkit-config-common                 12.8.57-1                                         all          Common config package for CUDA Toolkit.
ii  cuda-tools-12-2                            12.2.2-1                                          amd64        CUDA Tools meta-package
ii  cuda-visual-tools-12-2                     12.2.2-1                                          amd64        CUDA visual tools
ii  graphsurgeon-tf                            8.6.1.6-1+cuda12.0                                amd64        GraphSurgeon for TensorRT package
ii  libcudnn8                                  8.9.7.29-1+cuda12.2                               amd64        cuDNN runtime libraries
ii  libcudnn8-dev                              8.9.7.29-1+cuda12.2                               amd64        cuDNN development libraries and headers
ii  libnvinfer-bin                             8.6.1.6-1+cuda12.0                                amd64        TensorRT binaries
ii  libnvinfer-dev                             8.6.1.6-1+cuda12.0                                amd64        TensorRT development libraries
ii  libnvinfer-dispatch-dev                    8.6.1.6-1+cuda12.0                                amd64        TensorRT development dispatch runtime libraries
ii  libnvinfer-dispatch8                       8.6.1.6-1+cuda12.0                                amd64        TensorRT dispatch runtime library
ii  libnvinfer-headers-dev                     8.6.1.6-1+cuda12.0                                amd64        TensorRT development headers
ii  libnvinfer-headers-plugin-dev              8.6.1.6-1+cuda12.0                                amd64        TensorRT plugin headers
ii  libnvinfer-lean-dev                        8.6.1.6-1+cuda12.0                                amd64        TensorRT lean runtime libraries
ii  libnvinfer-lean8                           8.6.1.6-1+cuda12.0                                amd64        TensorRT lean runtime library
ii  libnvinfer-plugin-dev                      8.6.1.6-1+cuda12.0                                amd64        TensorRT plugin libraries
ii  libnvinfer-plugin8                         8.6.1.6-1+cuda12.0                                amd64        TensorRT plugin libraries
ii  libnvinfer-samples                         8.6.1.6-1+cuda12.0                                all          TensorRT samples
ii  libnvinfer-vc-plugin-dev                   8.6.1.6-1+cuda12.0                                amd64        TensorRT vc-plugin library
ii  libnvinfer-vc-plugin8                      8.6.1.6-1+cuda12.0                                amd64        TensorRT vc-plugin library
ii  libnvinfer8                                8.6.1.6-1+cuda12.0                                amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                       8.6.1.6-1+cuda12.0                                amd64        TensorRT ONNX libraries
ii  libnvonnxparsers8                          8.6.1.6-1+cuda12.0                                amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                           8.6.1.6-1+cuda12.0                                amd64        TensorRT parsers libraries
ii  libnvparsers8                              8.6.1.6-1+cuda12.0                                amd64        TensorRT parsers libraries
ii  onnx-graphsurgeon                          8.6.1.6-1+cuda12.0                                amd64        ONNX GraphSurgeon for TensorRT package
ii  python3-libnvinfer-dispatch                8.6.1.6-1+cuda12.0                                amd64        Python 3 bindings for TensorRT dispatch runtime
ii  python3-libnvinfer-lean                    8.6.1.6-1+cuda12.0                                amd64        Python 3 bindings for TensorRT lean runtime
ii  uff-converter-tf                           8.6.1.6-1+cuda12.0                                amd64        UFF converter for TensorRT package

Showing nothing will be correct result.

Do you mean there is not error now when you run

>>> import pycuda.driver as cuda
>>> cuda.init()

At first i had ModuleNotFoundError: No module named ‘pycuda’
so i did pip install pycuda
And now, i’ve no errors when running :

>>> import pycuda.driver as cuda
>>> cuda.init()

But still get errors on the notebook

What are the errors? Can you share the full logs?

When running :

# If you use your own dataset, you will need to run the code below to generate the best anchor shape

!tao model yolo_v4 kmeans \
    -l $DATA_DOWNLOAD_DIR/kitti_split/training/label \
    -i $DATA_DOWNLOAD_DIR/kitti_split/training/image \
    -n 9 \
    -x 960 \
    -y 544 \
#    -e nvcr.io/nvidia/tao/tao-toolkit:v5.5.0

# The anchor shape generated by this script is sorted. Write the first 3 into small_anchor_shape in the config
# file. Write middle 3 into mid_anchor_shape. Write last 3 into big_anchor_shape.

i get these errors :

2025-02-17 10:08:29,798 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-02-17 10:08:29,993 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2025-02-17 10:08:30,083 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 292: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/pcia2/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2025-02-17 10:08:30,083 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 305: Printing tty value True
Using TensorFlow backend.
2025-02-17 09:08:31.469551: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2025-02-17 09:08:31,616 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2025-02-17 09:08:32,639 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2025-02-17 09:08:32,667 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2025-02-17 09:08:32,672 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2025-02-17 09:08:33,890 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
2025-02-17 09:08:34,271 [TAO Toolkit] [WARNING] nvidia_tao_tf1.cv.common.export.keras_exporter 36: Failed to import TensorRT package, exporting TLT to a TensorRT engine will not be available.
Traceback (most recent call last):
  File "/usr/local/bin/yolo_v4", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/entrypoint/yolo_v4.py", line 12, in main
    launch_job(nvidia_tao_tf1.cv.yolo_v4.scripts, "yolo_v4", sys.argv[1:])
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 276, in launch_job
    modules = get_modules(package)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/entrypoint/entrypoint.py", line 47, in get_modules
    module = importlib.import_module(module_name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/scripts/export.py", line 21, in <module>
    from nvidia_tao_tf1.cv.yolo_v4.export.yolov4_exporter import YOLOv4Exporter as Exporter
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/yolo_v4/export/yolov4_exporter.py", line 42, in <module>
    from nvidia_tao_tf1.cv.common.export.keras_exporter import KerasExporter as Exporter
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/export/keras_exporter.py", line 46, in <module>
    from nvidia_tao_tf1.core.export.app import get_model_input_dtype
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/export/app.py", line 40, in <module>
    from nvidia_tao_tf1.core.export._tensorrt import keras_to_tensorrt
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/core/export/_tensorrt.py", line 39, in <module>
    import pycuda.autoinit  # noqa pylint: disable=W0611
  File "/usr/local/lib/python3.8/dist-packages/pycuda/autoinit.py", line 5, in <module>
    cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected
2025-02-17 10:08:34,810 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 367: Stopping container.

Please run
! tao model yolo_v4 run /bin/bash

Then inside the docker, run below.
# nvidia-smi
# python
# >>> import pycuda.driver as cuda
>>> cuda.init()

Here is what i get :

2025-02-17 10:35:21,125 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2025-02-17 10:35:21,198 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2025-02-17 10:35:21,245 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 292: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/pcia2/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2025-02-17 10:35:21,246 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 305: Printing tty value True
root@536d7a6cef60:/workspace# nvidia-smi
Failed to initialize NVML: Unknown Error
root@536d7a6cef60:/workspace# python
Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycuda.driver as cuda
>>> cuda.init()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected

It is not related to TAO.
Please check if the hint from Nvida Container Toolkit: Failed to initialize NVML: Unknown Error - #3 by SimonBirrell can help you.