RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

下記の環境でcuDNN8.9.3がシステムに認識されていない。また、実行したいコードを実行して、cuDNN初期化できないエラーがでて解決できません。
再起動も試しましたが変わりませんでした。
ご享受ください。
andolab@ubuntu:~$ jetson_release
Software part of jetson-stats 4.2.12 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson Orin NX Engineering Reference Developer Kit - Jetpack 6.1 [L4T 36.4.0]
NV Power Mode[2]: 15W
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

  • P-Number: p3767-0000
  • Module: NVIDIA Jetson Orin NX (16GB ram)
    Platform:
  • Distribution: Ubuntu 22.04 Jammy Jellyfish
  • Release: 5.15.148-tegra
    jtop:
  • Version: 4.2.12
  • Service: Active
    Libraries:
  • CUDA: 12.1.105
  • cuDNN: 1.0
  • TensorRT: Not installed
  • VPI: 3.2.4
  • Vulkan: 1.3.204
  • OpenCV: 4.8.0 - with CUDA: YES

pythonコード実行結果↓
andolab@ubuntu:~$ python3 test_run_maskrcnn_pipeline2.py
[ WARN:0@4.388] global cap_gstreamer.cpp:1728 open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
camera success!!!
Traceback (most recent call last):
File “/home/andolab/test_run_maskrcnn_pipeline2.py”, line 76, in
prediction = model(img_tensor)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torchvision-0.16.1-py3.10-linux-aarch64.egg/torchvision/models/detection/generalized_rcnn.py”, line 101, in forward
features = self.backbone(images.tensors)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torchvision-0.16.1-py3.10-linux-aarch64.egg/torchvision/models/detection/backbone_utils.py”, line 57, in forward
x = self.body(x)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torchvision-0.16.1-py3.10-linux-aarch64.egg/torchvision/models/_utils.py”, line 69, in forward
x = module(x)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py”, line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py”, line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

andolab@ubuntu:~/cudnn_samples_v8/mnistCUDNN$ ls
Makefile error_util.h fp16_dev.h fp16_emu.cpp fp16_emu.o mnistCUDNN mnistCUDNN.o
data fp16_dev.cu fp16_dev.o fp16_emu.h gemv.h mnistCUDNN.cpp readme.txt
andolab@ubuntu:~/cudnn_samples_v8/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8903 , CUDNN_VERSION from cudnn.h : 8903 (8.9.3)
Host compiler version : GCC 11.4.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 4 Capabilities 8.7, SmClock 918.0 Mhz, MemSize (Mb) 15655, MemClock 612.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
ERROR: cudnn failure (CUDNN_STATUS_NOT_INITIALIZED) in mnistCUDNN.cpp:414
Aborting…
cudnn8.9.3のサンプルコードを実行しましたが、Test passed!とはなりませんでした。
助けてください

Hi,

Please run the following two commands and share the output with us.

$ cat /etc/nv_tegra_release
$ apt show nvidia-jetpack

Thanks.

Hi, thanks so much for your reply.

Here is the output of the command:

andolab@ubuntu:~$ cat /etc/nv_tegra_release # R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024 # KERNEL_VARIANT: oot TARGET_USERSPACE_LIB_DIR=nvidia TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia andolab@ubuntu:~$ apt show nvidia-jetpack Package: nvidia-jetpack Version: 6.1+b123 Priority: standard Section: metapackages Source: nvidia-jetpack (6.1) Maintainer: NVIDIA Corporation Installed-Size: 199 kB Depends: nvidia-jetpack-runtime (= 6.1+b123), nvidia-jetpack-dev (= 6.1+b123) Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer Download-Size: 29.3 kB APT-Sources: https://repo.download.nvidia.com/jetson/common r36.4/main arm64 Packages Description: NVIDIA Jetpack Meta Package
thank you

I’m new to Linux and application development, so I’m not confident about the environment variables and the path to cuDNN.
I’m prepared to share that information if necessary.

Below are the versions and build information for python, opencv, pytorch, and torchvision. It seems that pytorch recognizes cuDNN, but the python file I want to run displays an error.

andolab@ubuntu:~$ python3
Python 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
print(torch.cuda.is_available())
True
print(torch.backends.cudnn.version())
8903
print(torch.version)
2.1.0
import torch
import torchvision
print(torchvision.version)
0.16.1
import cv2
print(cv2.version)
4.8.0
print(cv2.getBuildInformation())

General configuration for OpenCV 4.8.0 =====================================
Version control: 4.8.0

Extra modules:
Location (extra): /home/andolab/opencv_contrib/modules
Version control (extra): 4.8.1

Platform:
Timestamp: 2025-01-09T14:21:08Z
Host: Linux 5.15.148-tegra aarch64
CMake: 3.22.1
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/gmake
Configuration: Release

CPU/HW features:
Baseline: NEON FP16

C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: /usr/bin/c++ (ver 11.4.0)
C++ flags (Release): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,–gc-sections -Wl,–as-needed -Wl,–no-undefined
Linker flags (Debug): -Wl,–gc-sections -Wl,–as-needed -Wl,–no-undefined
ccache: NO
Precompiled headers: NO
Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda-12.1/lib64 -L/usr/lib/aarch64-linux-gnu
3rdparty dependencies:

OpenCV modules:
To be built: alphamat aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled: world
Disabled by dependency: -
Unavailable: cvv java julia matlab ovis python2 sfm viz
Applications: tests perf_tests apps
Documentation: NO
Non-free algorithms: NO

GUI: GTK3
GTK+: YES (ver 3.24.33)
GThread : YES (ver 2.72.4)
GtkGlExt: NO
VTK support: NO

Media I/O:
ZLib: /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
JPEG: /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
WEBP: build (ver encoder: 0x020f)
PNG: /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.37)
TIFF: /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.3.0)
JPEG 2000: build (ver 2.5.0)
OpenEXR: /usr/lib/aarch64-linux-gnu/libImath-2_5.so /usr/lib/aarch64-linux-gnu/libIlmImf-2_5.so /usr/lib/aarch64-linux-gnu/libIex-2_5.so /usr/lib/aarch64-linux-gnu/libHalf-2_5.so /usr/lib/aarch64-linux-gnu/libIlmThread-2_5.so (ver 2_5)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES

Video I/O:
DC1394: NO
FFMPEG: YES
avcodec: YES (58.134.100)
avformat: YES (58.76.100)
avutil: YES (56.70.100)
swscale: YES (5.9.100)
avresample: NO
GStreamer: YES (1.20.3)
v4l/v4l2: YES (linux/videodev2.h)

Parallel framework: pthreads

Trace: YES (with Intel ITT)

Other third-party libraries:
Lapack: NO
Eigen: YES (ver 3.4.0)
Custom HAL: YES (carotene (ver 0.0.1))
Protobuf: build (3.19.1)
Flatbuffers: builtin/3rdparty (23.5.9)

NVIDIA CUDA: YES (ver 12.1, CUFFT CUBLAS)
NVIDIA GPU arch: 87
NVIDIA PTX archs:

cuDNN: YES (ver 8.9.3)

OpenCL: YES (no extra features)
Include path: /home/andolab/opencv/3rdparty/include/opencl/1.2
Link libraries: Dynamic load

Python 3:
Interpreter: /usr/bin/python3 (ver 3.10.12)
Libraries: /usr/lib/aarch64-linux-gnu/libpython3.10.so (ver 3.10.12)
numpy: /home/andolab/.local/lib/python3.10/site-packages/numpy/core/include (ver 1.26.1)
install path: lib/python3.10/dist-packages/cv2/python-3.10

Python (for build): /usr/bin/python2.7

Java:
ant: NO
Java: NO
JNI: NO
Java wrappers: NO
Java tests: NO

Install to: /usr/local

I just realized that the CUDA-12.1 currently installed on Jetson orin nx and Ubuntu22.04 was installed using the arm64-sbsa local installer, not the aarch64-jetson one.

Could this be the cause of the current error?

I have used this in the past because there was no CUDA-12.1 aarch64-jetson installer compatible with Ubuntu22.04.

Hi,

We can run mnistCUDNN sample on r36.4.0+JetPack 6.1 components.

$ ./mnistCUDNN 
Executing: mnistCUDNN
cudnnGetVersion() : 90300 , CUDNN_VERSION from cudnn.h : 90300 (9.3.0)
Host compiler version : GCC 11.4.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 16  Capabilities 8.7, SmClock 1300.0 Mhz, MemSize (Mb) 62840, MemClock 1300.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.147296 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.196864 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.388576 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.688064 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 1.039168 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 1.622464 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.216096 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.688512 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.822208 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.838368 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.877408 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 1.666464 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.122048 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.138432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.145312 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.296352 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.364128 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.501216 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.231712 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.398784 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.690144 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.741504 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.826144 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.829024 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.096352 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.099072 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.120544 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.338880 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.363008 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.523200 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.281312 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.289696 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.412064 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.771072 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.873280 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 1.187264 time requiring 64000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.098656 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.111616 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.115360 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.313728 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.313920 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.424320 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.233216 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.244800 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.273760 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.516480 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.737504 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.797696 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!
nvidia@tegra-ubuntu:/usr/src/cudnn_samples_v9/mnistCUDNN$ 

Could you share how you setup the environment with us?
For JetPack 6.1, it’s expected to have cuDNN 9.3 instead of 8.9 in your environment.

Could you try to reflash the device to see if it can fix the issue?

Thanks.

Hello.

I didn’t think I could solve the problem myself, so I used SDKmanager to import Jetpack 6.1 onto a new SSD.

And in this environment, I was able to successfully run the mnist tests in the cudnn sample, just like you did.

Software part of jetson-stats 4.3.0 - (c) 2024, Raffaello Bonghi Model: NVIDIA Jetson Orin NX Engineering Reference Developer Kit - Jetpack 6.1 [L4T 36.4.0] NV Power Mode[2]: 15W Serial Number: [XXX Show with: jetson_release -s XXX] Hardware: -P-Number: p3767-0000 -Module: NVIDIA Jetson Orin NX (16GB ram) Platform: - Distribution: Ubuntu 22.04 Jammy Jellyfish - Release: 5.15.148-tegra jtop: - Version: 4.3.0 - Service: Active Libraries: -CUDA: 12.6.68 - cuDNN: 9.3.0.75 - TensorRT: 10.3.0.30

  • VPI: 3.2.4

  • Vulkan: 1.3.204

  • OpenCV: Not installed

I don’t know why CUDNN was not recognized in the previous environment, but the error that CUDNN could not be initialized did not occur, and the program ran.

However, there is still a problem.

The program I wanted to run is an application that performs inference on real-time video using Mask R-CNN.

I am trying to convert Mask R-CNN to the TensorRT engine to speed up inference.

I am trying to install it as described on the torch2trt Git page, but I get the following error:

andolab@andolab:~/torch2trt$ sudo python3 setup.py install

Traceback (most recent call last):

File “/home/andolab/torch2trt/setup.py”, line 3, in

import torch

ModuleNotFoundError: No module named ‘torch’

I get this error.

However, it should be installed as shown below.

Why can’t I install torch2trt?

andolab@andolab:~/torch2trt$ pip show torch Name: torch Version: 2.3.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /home/andolab/.local/lib/python3.10/site-packages Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions Required-by: torchvision GitHub - NVIDIA-AI-IOT/torch2trt: An easy to use PyTorch to TensorRT converter Step 1 - Install the torch2trt Python library To install the torch2trt Python library, call the following git clone GitHub - NVIDIA-AI-IOT/torch2trt: An easy to use PyTorch to TensorRT converter cd torch2trt python setup.py install

andolab@andolab:/usr/src/cudnn_samples_v9/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 90300 , CUDNN_VERSION from cudnn.h : 90300 (9.3.0)
Host compiler version : GCC 11.4.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 4 Capabilities 8.7, SmClock 918.0 Mhz, MemSize (Mb) 15655, MemClock 612.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.075424 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.098560 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.116928 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.190240 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.498432 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.661184 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.193728 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.398368 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.678176 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.861440 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 1.038112 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.915360 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.044320 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.045376 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.067840 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.190368 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.190816 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.481536 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.183328 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.328704 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.499104 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.561440 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.920480 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 2.058144 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.109472 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.118688 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.127392 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.268384 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.299904 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.484608 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.161760 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.217664 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.521952 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.738752 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.863936 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.917696 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.105824 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.109248 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.111296 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.254240 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.256128 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.477280 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.213696 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.219776 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.399776 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.492416 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.864800 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.919552 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Hi,

ModuleNotFoundError: No module named ‘torch’

The error indicates a missing library.
PyTorch can be found in the link below:

https://pypi.jetson-ai-lab.dev/jp6/cu126

Thanks.

Hello.

I gave up on installing torch2trt and am now trying to run Mask R-CNN, but it’s not working.

I want to load Mask R-CNN trained with my own training data on Pytorch and output object detection and segmentation processing for each frame of real-time video, but it doesn’t show up in the output.

The camera opens and inference seems to be performed, but the inference results are 0 for labels, masks, and boxes.

Is the weight loading method wrong?

Below is the Python code that tries to run Mask R-CNN. What should I improve? The backbone of Mask R-CNN is Resnet101. epoch_13.pth is the Mask R-CNN model that I trained myself.

Thanks.

import torch
import cv2
import numpy as np
from torchvision import transforms
import random
import time
from torchvision.models.detection import MaskRCNN
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone

ResNet101バックボーンを使って、FPN付きのMask R-CNNモデルを作成

backbone = resnet_fpn_backbone(‘resnet101’, weights=None)
model = MaskRCNN(backbone, num_classes=11) # num_classesはバックグラウンドを含めたクラス数

GPUが利用可能な場合にGPUを使用する

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

cuDNN 初期化エラー対策

torch.backends.cudnn.benchmark = True
torch.backends.cudnn.enabled = True

モデルのチェックポイントをロード

checkpoint = torch.load(‘epoch_13.pth’, map_location=device)
model.load_state_dict(checkpoint[‘state_dict’], strict=False)
#model.load_state_dict(checkpoint[‘state_dict’])
model = model.to(device)
model.eval()

独自のクラスラベル

labels = [“0”, “T”, “L”, “V”, “C”, “C2”, “N”, “N2”, “Ca”, “V2”, “ChT”]

GStreamerパイプライン設定

pipeline = “v4l2src device=/dev/video0 ! videoconvert ! videocrop top=90 left=565 right=410 bottom=250 ! videoconvert ! appsink”
cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)

if not cap.isOpened():
print(“Failed to open camera”)
exit()

画像変換処理

transform = transforms.Compose([transforms.ToTensor()])

カラーマップを生成

def get_random_colors(num_colors):
return [tuple(random.randint(0, 255) for _ in range(3)) for _ in range(num_colors)]

推論結果を描画する関数

def draw_predictions(frame, prediction, score_threshold=0.1):
boxes = prediction[‘boxes’].cpu().numpy()
labels_pred = prediction[‘labels’].cpu().numpy()
scores = prediction[‘scores’].cpu().numpy()
masks = (prediction[‘masks’] > 0.5).cpu().numpy()
colors = get_random_colors(len(boxes))

# デバッグ: 出力される予測結果を確認
print(f"Boxes: {boxes.shape}, Labels: {labels_pred.shape}, Scores: {scores.shape}, Masks: {masks.shape}")


for i in range(len(boxes)):
    if scores[i] < score_threshold:
        continue

    box = boxes[i]
    label = labels_pred[i]
    mask = masks[i][0]  # マスクを取り出す
    color = colors[i]

    # バウンディングボックスの描画
    x1, y1, x2, y2 = map(int, box)
    cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)

    # ラベル表示
    label_name = labels[label] if 0 <= label < len(labels) else f"Unknown ({label})"
    cv2.putText(frame, f"{label_name} {scores[i]:.2f}", (x1, y1 - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

    # マスクの描画
    mask = mask > 0.5  # 2値化
    mask_color = np.zeros_like(frame, dtype=np.uint8)
    mask_color[mask] = color

    # マスクを画像に合成
    frame = cv2.addWeighted(frame, 1.0, mask_color, 0.5, 0)

return frame

メインループ

frame_count = 0
while True:
ret, frame = cap.read()
if not ret:
print(“Failed to grab frame”)
break

frame_count += 1
start_time = time.time()

# 推論準備
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
img_tensor = transform(frame_rgb).unsqueeze(0).to(device)

# 推論
with torch.no_grad():
    prediction = model(img_tensor)[0]

# 描画
frame = draw_predictions(frame, prediction)

# FPS計算と表示
end_time = time.time()
fps = 1 / (end_time - start_time)
cv2.putText(frame, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# 表示
cv2.imshow('Mask R-CNN Inference', frame)

# 終了キーの確認
if cv2.waitKey(1) & 0xFF == ord('q'):
    break

終了処理

cap.release()
cv2.destroyAllWindows()

I did some debugging to check the inference results, but the results are as shown below. All the inference results were empty.
Why are the inference results empty?
I’m a beginner at machine learning, so I don’t know.

Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])
Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])
Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])
Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])
Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])