Please provide the following information when requesting support.
• Hardware (2080TI)
• Network Type (Detectnet_v2)
• TAO Version (nvidia/tao/tao-toolkit-tf, nvidia/tao/tao-toolkit-pyt, nvidia/tao/tao-toolkit-lm)
• Training spec file(the default from detectnet_v2)
• How to reproduce the issue ?
I followed the steps on TAO Toolkit Quick Start Guide — TAO Toolkit 3.22.05 documentation
and detectnet_v2/detectnet_v2.ipynb from TAO Toolkit Computer Vision Sample Workflows | NVIDIA NGC
Using this container TAO Toolkit for Computer Vision | NVIDIA NGC
First I run the container with:
docker run --gpus all --privileged -it -v /var/run/docker.sock:/var/run/docker.sock --network host nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
Second I follow the steps on Tao Toolkit Quick Start, I get the model (PeopleNet), and I get the Jupyter notebook running. Using:
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root
I follow the detectnet_v2 notebook without problems until I get to 2.C :
tao detectnet_v2 dataset_convert \
-d $SPECS_DIR/detectnet_v2_tfrecords_kitti_trainval.txt \
-o $DATA_DOWNLOAD_DIR/tfrecords/kitti_trainval/kitti_trainval
It shows me the following error
Converting Tfrecords for kitti trainval dataset
2021-12-06 20:24:46,566 [INFO] root: Registry: ['nvcr.io']
2021-12-06 20:24:46,622 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
2021-12-06 20:24:46,690 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/root/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File "/usr/local/bin/detectnet_v2", line 8, in <module>
sys.exit(main())
File "/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/entrypoint/detectnet_v2.py", line 12, in main
File "/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 256, in launch_job
File "/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 47, in get_modules
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/export.py", line 8, in <module>
File "/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/export/exporter.py", line 12, in <module>
File "/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/keras_exporter.py", line 22, in <module>
File "/opt/tlt/.cache/dazel/_dazel_tlt/75913d2aee35770fa76c4a63d877f3aa/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py", line 27, in <module>
File "/usr/local/lib/python3.6/dist-packages/pycuda/autoinit.py", line 5, in <module>
cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected
2021-12-06 20:24:51,865 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
nvidia-smi
Mon Dec 6 22:23:36 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44 Driver Version: 495.44 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:0A:00.0 On | N/A |
| 35% 33C P8 30W / 260W | 1043MiB / 11016MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
dpkg -l | grep cuda
ii cuda-command-line-tools-11-3 11.3.1-1 amd64 CUDA command-line tools
ii cuda-compat-11-3 465.19.01-1 amd64 CUDA Compatibility Platform
ii cuda-compiler-11-3 11.3.1-1 amd64 CUDA compiler
ii cuda-cudart-11-3 11.3.109-1 amd64 CUDA Runtime native Libraries
ii cuda-cudart-dev-11-3 11.3.109-1 amd64 CUDA Runtime native dev links, headers
ii cuda-cuobjdump-11-3 11.3.58-1 amd64 CUDA cuobjdump
ii cuda-cupti-11-3 11.3.111-1 amd64 CUDA profiling tools runtime libs.
ii cuda-cupti-dev-11-3 11.3.111-1 amd64 CUDA profiling tools interface.
ii cuda-cuxxfilt-11-3 11.3.58-1 amd64 CUDA cuxxfilt
ii cuda-driver-dev-11-3 11.3.109-1 amd64 CUDA Driver native dev stub library
ii cuda-gdb-11-3 11.3.109-1 amd64 CUDA-GDB
ii cuda-libraries-11-3 11.3.1-1 amd64 CUDA Libraries 11.3 meta-package
ii cuda-libraries-dev-11-3 11.3.1-1 amd64 CUDA Libraries 11.3 development meta-package
ii cuda-memcheck-11-3 11.3.109-1 amd64 CUDA-MEMCHECK
ii cuda-minimal-build-11-3 11.3.1-1 amd64 Minimal CUDA 11.3 toolkit build packages.
ii cuda-nvcc-11-3 11.3.109-1 amd64 CUDA nvcc
ii cuda-nvdisasm-11-3 11.3.58-1 amd64 CUDA disassembler
ii cuda-nvml-dev-11-3 11.3.58-1 amd64 NVML native dev links, headers
ii cuda-nvprof-11-3 11.3.111-1 amd64 CUDA Profiler tools
ii cuda-nvprune-11-3 11.3.58-1 amd64 CUDA nvprune
ii cuda-nvrtc-11-1 11.1.74-1 amd64 NVRTC native runtime libraries
ii cuda-nvrtc-11-3 11.3.109-1 amd64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-11-1 11.1.74-1 amd64 NVRTC native dev links, headers
ii cuda-nvrtc-dev-11-3 11.3.109-1 amd64 NVRTC native dev links, headers
ii cuda-nvtx-11-3 11.3.109-1 amd64 NVIDIA Tools Extension
ii cuda-sanitizer-11-3 11.3.111-1 amd64 CUDA Sanitizer
ii cuda-thrust-11-3 11.3.109-1 amd64 CUDA Thrust
ii cuda-toolkit-11-3-config-common 11.3.109-1 all Common config package for CUDA Toolkit 11.3.
ii cuda-toolkit-11-config-common 11.4.108-1 all Common config package for CUDA Toolkit 11.
ii cuda-toolkit-config-common 11.4.108-1 all Common config package for CUDA Toolkit.
hi libcudnn8 8.2.1.32-1+cuda11.3 amd64 cuDNN runtime libraries
ii libcudnn8-dev 8.2.1.32-1+cuda11.3 amd64 cuDNN development libraries and headers
hi libnccl-dev 2.9.9-1+cuda11.3 amd64 NVIDIA Collective Communication Library (NCCL) Development Files
hi libnccl2 2.9.9-1+cuda11.3 amd64 NVIDIA Collective Communication Library (NCCL) Runtime
ii libnvinfer-bin 8.0.1-1+cuda11.3 amd64 TensorRT binaries
ii libnvinfer-dev 8.0.1-1+cuda11.3 amd64 TensorRT development libraries and headers
ii libnvinfer-plugin-dev 8.0.1-1+cuda11.3 amd64 TensorRT plugin libraries
ii libnvinfer-plugin8 8.0.1-1+cuda11.3 amd64 TensorRT plugin libraries
ii libnvinfer8 8.0.1-1+cuda11.3 amd64 TensorRT runtime libraries
ii libnvonnxparsers-dev 8.0.1-1+cuda11.3 amd64 TensorRT ONNX libraries
ii libnvonnxparsers8 8.0.1-1+cuda11.3 amd64 TensorRT ONNX libraries
ii libnvparsers-dev 8.0.1-1+cuda11.3 amd64 TensorRT parsers libraries
ii libnvparsers8 8.0.1-1+cuda11.3 amd64 TensorRT parsers libraries
python cuda.init()
python
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycuda
>>> import pycuda.driver as cuda
>>> cuda.init()
>>>
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
I wonder what might be causing this error, the deepstream container runs fine here.