RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

sh1n0.ytaku · January 10, 2025, 9:04am

下記の環境でcuDNN8.9.3がシステムに認識されていない。また、実行したいコードを実行して、cuDNN初期化できないエラーがでて解決できません。
再起動も試しましたが変わりませんでした。
ご享受ください。
andolab@ubuntu:~$ jetson_release
Software part of jetson-stats 4.2.12 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson Orin NX Engineering Reference Developer Kit - Jetpack 6.1 [L4T 36.4.0]
NV Power Mode[2]: 15W
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

P-Number: p3767-0000
Module: NVIDIA Jetson Orin NX (16GB ram)
Platform:
Distribution: Ubuntu 22.04 Jammy Jellyfish
Release: 5.15.148-tegra
jtop:
Version: 4.2.12
Service: Active
Libraries:
CUDA: 12.1.105
cuDNN: 1.0
TensorRT: Not installed
VPI: 3.2.4
Vulkan: 1.3.204
OpenCV: 4.8.0 - with CUDA: YES

pythonコード実行結果↓
andolab@ubuntu:~$ python3 test_run_maskrcnn_pipeline2.py
[ WARN:0@4.388] global cap_gstreamer.cpp:1728 open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
camera success!!!
Traceback (most recent call last):
File “/home/andolab/test_run_maskrcnn_pipeline2.py”, line 76, in
prediction = model(img_tensor)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torchvision-0.16.1-py3.10-linux-aarch64.egg/torchvision/models/detection/generalized_rcnn.py”, line 101, in forward
features = self.backbone(images.tensors)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torchvision-0.16.1-py3.10-linux-aarch64.egg/torchvision/models/detection/backbone_utils.py”, line 57, in forward
x = self.body(x)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torchvision-0.16.1-py3.10-linux-aarch64.egg/torchvision/models/_utils.py”, line 69, in forward
x = module(x)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py”, line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File “/home/andolab/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py”, line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

sh1n0.ytaku · January 10, 2025, 9:07am

andolab@ubuntu:~/cudnn_samples_v8/mnistCUDNN$ ls
Makefile error_util.h fp16_dev.h fp16_emu.cpp fp16_emu.o mnistCUDNN mnistCUDNN.o
data fp16_dev.cu fp16_dev.o fp16_emu.h gemv.h mnistCUDNN.cpp readme.txt
andolab@ubuntu:~/cudnn_samples_v8/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8903 , CUDNN_VERSION from cudnn.h : 8903 (8.9.3)
Host compiler version : GCC 11.4.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 4 Capabilities 8.7, SmClock 918.0 Mhz, MemSize (Mb) 15655, MemClock 612.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
ERROR: cudnn failure (CUDNN_STATUS_NOT_INITIALIZED) in mnistCUDNN.cpp:414
Aborting…
cudnn8.9.3のサンプルコードを実行しましたが、Test passed!とはなりませんでした。
助けてください

AastaLLL · January 13, 2025, 4:13am

Hi,

Please run the following two commands and share the output with us.

$ cat /etc/nv_tegra_release
$ apt show nvidia-jetpack

Thanks.

sh1n0.ytaku · January 13, 2025, 4:39am

Hi, thanks so much for your reply.

Here is the output of the command:

andolab@ubuntu:~$ cat /etc/nv_tegra_release # R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024 # KERNEL_VARIANT: oot TARGET_USERSPACE_LIB_DIR=nvidia TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia andolab@ubuntu:~$ apt show nvidia-jetpack Package: nvidia-jetpack Version: 6.1+b123 Priority: standard Section: metapackages Source: nvidia-jetpack (6.1) Maintainer: NVIDIA Corporation Installed-Size: 199 kB Depends: nvidia-jetpack-runtime (= 6.1+b123), nvidia-jetpack-dev (= 6.1+b123) Homepage: Jetson - Embedded AI Computing Platform | NVIDIA Developer Download-Size: 29.3 kB APT-Sources: https://repo.download.nvidia.com/jetson/common r36.4/main arm64 Packages Description: NVIDIA Jetpack Meta Package
thank you

sh1n0.ytaku · January 13, 2025, 4:44am

I’m new to Linux and application development, so I’m not confident about the environment variables and the path to cuDNN.
I’m prepared to share that information if necessary.

sh1n0.ytaku · January 13, 2025, 7:19am

Below are the versions and build information for python, opencv, pytorch, and torchvision. It seems that pytorch recognizes cuDNN, but the python file I want to run displays an error.

andolab@ubuntu:~$ python3
Python 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
print(torch.cuda.is_available())
True
print(torch.backends.cudnn.version())
8903
print(torch.version)
2.1.0
import torch
import torchvision
print(torchvision.version)
0.16.1
import cv2
print(cv2.version)
4.8.0
print(cv2.getBuildInformation())

General configuration for OpenCV 4.8.0 =====================================
Version control: 4.8.0

Extra modules:
Location (extra): /home/andolab/opencv_contrib/modules
Version control (extra): 4.8.1

Platform:
Timestamp: 2025-01-09T14:21:08Z
Host: Linux 5.15.148-tegra aarch64
CMake: 3.22.1
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/gmake
Configuration: Release

CPU/HW features:
Baseline: NEON FP16

C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: /usr/bin/c++ (ver 11.4.0)
C++ flags (Release): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,–gc-sections -Wl,–as-needed -Wl,–no-undefined
Linker flags (Debug): -Wl,–gc-sections -Wl,–as-needed -Wl,–no-undefined
ccache: NO
Precompiled headers: NO
Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda-12.1/lib64 -L/usr/lib/aarch64-linux-gnu
3rdparty dependencies:

OpenCV modules:
To be built: alphamat aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled: world
Disabled by dependency: -
Unavailable: cvv java julia matlab ovis python2 sfm viz
Applications: tests perf_tests apps
Documentation: NO
Non-free algorithms: NO

GUI: GTK3
GTK+: YES (ver 3.24.33)
GThread : YES (ver 2.72.4)
GtkGlExt: NO
VTK support: NO

Media I/O:
ZLib: /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
JPEG: /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
WEBP: build (ver encoder: 0x020f)
PNG: /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.37)
TIFF: /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.3.0)
JPEG 2000: build (ver 2.5.0)
OpenEXR: /usr/lib/aarch64-linux-gnu/libImath-2_5.so /usr/lib/aarch64-linux-gnu/libIlmImf-2_5.so /usr/lib/aarch64-linux-gnu/libIex-2_5.so /usr/lib/aarch64-linux-gnu/libHalf-2_5.so /usr/lib/aarch64-linux-gnu/libIlmThread-2_5.so (ver 2_5)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES

Video I/O:
DC1394: NO
FFMPEG: YES
avcodec: YES (58.134.100)
avformat: YES (58.76.100)
avutil: YES (56.70.100)
swscale: YES (5.9.100)
avresample: NO
GStreamer: YES (1.20.3)
v4l/v4l2: YES (linux/videodev2.h)

Parallel framework: pthreads

Trace: YES (with Intel ITT)

Other third-party libraries:
Lapack: NO
Eigen: YES (ver 3.4.0)
Custom HAL: YES (carotene (ver 0.0.1))
Protobuf: build (3.19.1)
Flatbuffers: builtin/3rdparty (23.5.9)

NVIDIA CUDA: YES (ver 12.1, CUFFT CUBLAS)
NVIDIA GPU arch: 87
NVIDIA PTX archs:

cuDNN: YES (ver 8.9.3)

OpenCL: YES (no extra features)
Include path: /home/andolab/opencv/3rdparty/include/opencl/1.2
Link libraries: Dynamic load

Python 3:
Interpreter: /usr/bin/python3 (ver 3.10.12)
Libraries: /usr/lib/aarch64-linux-gnu/libpython3.10.so (ver 3.10.12)
numpy: /home/andolab/.local/lib/python3.10/site-packages/numpy/core/include (ver 1.26.1)
install path: lib/python3.10/dist-packages/cv2/python-3.10

Python (for build): /usr/bin/python2.7

Java:
ant: NO
Java: NO
JNI: NO
Java wrappers: NO
Java tests: NO

Install to: /usr/local

sh1n0.ytaku · January 13, 2025, 2:31pm

I just realized that the CUDA-12.1 currently installed on Jetson orin nx and Ubuntu22.04 was installed using the arm64-sbsa local installer, not the aarch64-jetson one.

Could this be the cause of the current error?

I have used this in the past because there was no CUDA-12.1 aarch64-jetson installer compatible with Ubuntu22.04.

AastaLLL · January 15, 2025, 5:07am

Hi,

We can run mnistCUDNN sample on r36.4.0+JetPack 6.1 components.

$ ./mnistCUDNN 
Executing: mnistCUDNN
cudnnGetVersion() : 90300 , CUDNN_VERSION from cudnn.h : 90300 (9.3.0)
Host compiler version : GCC 11.4.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 16  Capabilities 8.7, SmClock 1300.0 Mhz, MemSize (Mb) 62840, MemClock 1300.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.147296 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.196864 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.388576 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.688064 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 1.039168 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 1.622464 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.216096 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.688512 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.822208 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.838368 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.877408 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 1.666464 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.122048 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.138432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.145312 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.296352 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.364128 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.501216 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.231712 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.398784 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.690144 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.741504 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.826144 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.829024 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.096352 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.099072 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.120544 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.338880 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.363008 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.523200 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.281312 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.289696 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.412064 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.771072 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.873280 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 1.187264 time requiring 64000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.098656 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.111616 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.115360 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.313728 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.313920 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.424320 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.233216 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.244800 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.273760 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.516480 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.737504 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.797696 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!
nvidia@tegra-ubuntu:/usr/src/cudnn_samples_v9/mnistCUDNN$

Could you share how you setup the environment with us?
For JetPack 6.1, it’s expected to have cuDNN 9.3 instead of 8.9 in your environment.

Could you try to reflash the device to see if it can fix the issue?

Thanks.

sh1n0.ytaku · January 15, 2025, 9:03am

Hello.

I didn’t think I could solve the problem myself, so I used SDKmanager to import Jetpack 6.1 onto a new SSD.

And in this environment, I was able to successfully run the mnist tests in the cudnn sample, just like you did.

Software part of jetson-stats 4.3.0 - (c) 2024, Raffaello Bonghi Model: NVIDIA Jetson Orin NX Engineering Reference Developer Kit - Jetpack 6.1 [L4T 36.4.0] NV Power Mode[2]: 15W Serial Number: [XXX Show with: jetson_release -s XXX] Hardware: -P-Number: p3767-0000 -Module: NVIDIA Jetson Orin NX (16GB ram) Platform: - Distribution: Ubuntu 22.04 Jammy Jellyfish - Release: 5.15.148-tegra jtop: - Version: 4.3.0 - Service: Active Libraries: -CUDA: 12.6.68 - cuDNN: 9.3.0.75 - TensorRT: 10.3.0.30

VPI: 3.2.4
Vulkan: 1.3.204
OpenCV: Not installed

I don’t know why CUDNN was not recognized in the previous environment, but the error that CUDNN could not be initialized did not occur, and the program ran.

However, there is still a problem.

The program I wanted to run is an application that performs inference on real-time video using Mask R-CNN.

I am trying to convert Mask R-CNN to the TensorRT engine to speed up inference.

I am trying to install it as described on the torch2trt Git page, but I get the following error:

andolab@andolab:~/torch2trt$ sudo python3 setup.py install

Traceback (most recent call last):

File “/home/andolab/torch2trt/setup.py”, line 3, in

import torch

ModuleNotFoundError: No module named ‘torch’

I get this error.

However, it should be installed as shown below.

Why can’t I install torch2trt?

andolab@andolab:~/torch2trt$ pip show torch Name: torch Version: 2.3.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: /home/andolab/.local/lib/python3.10/site-packages Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions Required-by: torchvision GitHub - NVIDIA-AI-IOT/torch2trt: An easy to use PyTorch to TensorRT converter Step 1 - Install the torch2trt Python library To install the torch2trt Python library, call the following git clone GitHub - NVIDIA-AI-IOT/torch2trt: An easy to use PyTorch to TensorRT converter cd torch2trt python setup.py install

andolab@andolab:/usr/src/cudnn_samples_v9/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 90300 , CUDNN_VERSION from cudnn.h : 90300 (9.3.0)
Host compiler version : GCC 11.4.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 4 Capabilities 8.7, SmClock 918.0 Mhz, MemSize (Mb) 15655, MemClock 612.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.075424 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.098560 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.116928 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.190240 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.498432 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.661184 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.193728 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.398368 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.678176 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.861440 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 1.038112 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.915360 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.044320 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.045376 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.067840 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.190368 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.190816 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.481536 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.183328 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.328704 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.499104 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.561440 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.920480 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 2.058144 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.109472 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.118688 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.127392 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.268384 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.299904 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.484608 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.161760 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.217664 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.521952 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.738752 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.863936 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.917696 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation …
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.105824 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.109248 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.111296 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.254240 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.256128 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.477280 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm …
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.213696 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.219776 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.399776 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.492416 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.864800 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 1.919552 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation …
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

AastaLLL · January 16, 2025, 7:40am

Hi,

ModuleNotFoundError: No module named ‘torch’

The error indicates a missing library.
PyTorch can be found in the link below:

https://pypi.jetson-ai-lab.dev/jp6/cu126

Thanks.

sh1n0.ytaku · January 16, 2025, 9:02am

Hello.

I gave up on installing torch2trt and am now trying to run Mask R-CNN, but it’s not working.

I want to load Mask R-CNN trained with my own training data on Pytorch and output object detection and segmentation processing for each frame of real-time video, but it doesn’t show up in the output.

The camera opens and inference seems to be performed, but the inference results are 0 for labels, masks, and boxes.

Is the weight loading method wrong?

Below is the Python code that tries to run Mask R-CNN. What should I improve? The backbone of Mask R-CNN is Resnet101. epoch_13.pth is the Mask R-CNN model that I trained myself.

Thanks.

import torch
import cv2
import numpy as np
from torchvision import transforms
import random
import time
from torchvision.models.detection import MaskRCNN
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone

ResNet101バックボーンを使って、FPN付きのMask R-CNNモデルを作成

backbone = resnet_fpn_backbone(‘resnet101’, weights=None)
model = MaskRCNN(backbone, num_classes=11) # num_classesはバックグラウンドを含めたクラス数

GPUが利用可能な場合にGPUを使用する

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

cuDNN 初期化エラー対策

torch.backends.cudnn.benchmark = True
torch.backends.cudnn.enabled = True

モデルのチェックポイントをロード

checkpoint = torch.load(‘epoch_13.pth’, map_location=device)
model.load_state_dict(checkpoint[‘state_dict’], strict=False)
#model.load_state_dict(checkpoint[‘state_dict’])
model = model.to(device)
model.eval()

独自のクラスラベル

labels = [“0”, “T”, “L”, “V”, “C”, “C2”, “N”, “N2”, “Ca”, “V2”, “ChT”]

GStreamerパイプライン設定

pipeline = “v4l2src device=/dev/video0 ! videoconvert ! videocrop top=90 left=565 right=410 bottom=250 ! videoconvert ! appsink”
cap = cv2.VideoCapture(pipeline, cv2.CAP_GSTREAMER)

if not cap.isOpened():
print(“Failed to open camera”)
exit()

画像変換処理

transform = transforms.Compose([transforms.ToTensor()])

カラーマップを生成

def get_random_colors(num_colors):
return [tuple(random.randint(0, 255) for _ in range(3)) for _ in range(num_colors)]

推論結果を描画する関数

def draw_predictions(frame, prediction, score_threshold=0.1):
boxes = prediction[‘boxes’].cpu().numpy()
labels_pred = prediction[‘labels’].cpu().numpy()
scores = prediction[‘scores’].cpu().numpy()
masks = (prediction[‘masks’] > 0.5).cpu().numpy()
colors = get_random_colors(len(boxes))

# デバッグ: 出力される予測結果を確認
print(f"Boxes: {boxes.shape}, Labels: {labels_pred.shape}, Scores: {scores.shape}, Masks: {masks.shape}")


for i in range(len(boxes)):
    if scores[i] < score_threshold:
        continue

    box = boxes[i]
    label = labels_pred[i]
    mask = masks[i][0]  # マスクを取り出す
    color = colors[i]

    # バウンディングボックスの描画
    x1, y1, x2, y2 = map(int, box)
    cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)

    # ラベル表示
    label_name = labels[label] if 0 <= label < len(labels) else f"Unknown ({label})"
    cv2.putText(frame, f"{label_name} {scores[i]:.2f}", (x1, y1 - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

    # マスクの描画
    mask = mask > 0.5  # 2値化
    mask_color = np.zeros_like(frame, dtype=np.uint8)
    mask_color[mask] = color

    # マスクを画像に合成
    frame = cv2.addWeighted(frame, 1.0, mask_color, 0.5, 0)

return frame

メインループ

frame_count = 0
while True:
ret, frame = cap.read()
if not ret:
print(“Failed to grab frame”)
break

frame_count += 1
start_time = time.time()

# 推論準備
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
img_tensor = transform(frame_rgb).unsqueeze(0).to(device)

# 推論
with torch.no_grad():
    prediction = model(img_tensor)[0]

# 描画
frame = draw_predictions(frame, prediction)

# FPS計算と表示
end_time = time.time()
fps = 1 / (end_time - start_time)
cv2.putText(frame, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

# 表示
cv2.imshow('Mask R-CNN Inference', frame)

# 終了キーの確認
if cv2.waitKey(1) & 0xFF == ord('q'):
    break

終了処理

cap.release()
cv2.destroyAllWindows()

sh1n0.ytaku · January 16, 2025, 9:35am

I did some debugging to check the inference results, but the results are as shown below. All the inference results were empty.
Why are the inference results empty?
I’m a beginner at machine learning, so I don’t know.

Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])
Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])
Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])
Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])
Scores:
Boxes: (0, 4), Labels: (0,), Scores: (0,), Masks: (0, 1, 740, 945)
Original frame size: (740, 945, 3)
Processed image tensor size: torch.Size([1, 3, 740, 945])
Prediction keys: dict_keys([‘boxes’, ‘labels’, ‘scores’, ‘masks’])
Prediction boxes shape: torch.Size([0, 4])
Prediction scores: tensor(, device=‘cuda:0’)
Prediction masks shape: torch.Size([0, 1, 740, 945])

AastaLLL · January 20, 2025, 8:24am

Hi,

Where does the sample come from?
Did it work in other environments?

Thanks.

sh1n0.ytaku · January 20, 2025, 8:43am

Hello.

I haven’t tried it in other environments.

The current code was created by me based on the code generated by chatGPT.

I am writing code to load the weights of Mask R-CNN (Resnet101) that I trained on Pytorch with my own dataset and perform inference on real-time video. I cannot share all of the source code, but I will share some of it.

If I initialize the weights with the trained weights of ImageNet and then load and overwrite the weights from the epoch_13.pth file, inference is performed, but

if I do not use the weights of ImageNet, initialize with random weights, and load and overwrite the weights from the epoch_13.pth file, inference is not performed and all results are empty.

I actually want to use only the weights from my own dataset.

Is there a way to run the Mask R-CNN model directly using the Mask R-CNN model file containing weight information epoch_13.pth? If there is, please share the code or a useful article.

Also, are there any improvements to the loading method shown below?

Thanks

import torch
import cv2
import numpy as np
from torchvision import models, transforms
import random
import time
from torchvision.models.detection import MaskRCNN
from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
from torchvision.models import ResNet101_Weights

モデルのロード (ResNet101バックボーン)

def get_model(num_classes, weights_path):
“”"
# ResNet101バックボーンを使用したMask R-CNNモデルを作成
model = models.detection.maskrcnn_resnet50_fpn(
weights=None, # ResNet101用の重みを使用するため、pretrained=Falseを指定
num_classes=num_classes
)
“”"
“”"
# ResNet101バックボーンを使用したMask R-CNNモデルを作成
model = maskrcnn_resnet101_fpn(weights=None, num_classes=num_classes)
“”"

# ResNet101バックボーンを生成
backbone = resnet_fpn_backbone('resnet101', weights=ResNet101_Weights.IMAGENET1K_V1)  # ResNet101バックボーン、事前学習済み重み（ImageNet）使用
#backbone = resnet_fpn_backbone('resnet101', weights=None)  # ResNet101バックボーン 
model = MaskRCNN(backbone, num_classes=num_classes)          # Mask R-CNN作成


# 重みをロード
checkpoint = torch.load(weights_path, map_location=torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
#model.load_state_dict(checkpoint, strict=False)
model.load_state_dict(checkpoint['state_dict'], strict=False)
#model.load_state_dict(checkpoint, strict=True)
return model

カラーマップ生成

def get_random_colors(num_colors):
return [tuple(random.randint(0, 255) for _ in range(3)) for _ in range(num_colors)]

推論結果の描画

def draw_predictions(frame, predictions, labels, score_threshold):
boxes = predictions[0][‘boxes’].cpu().numpy()
scores = predictions[0][‘scores’].cpu().numpy()
labels_pred = predictions[0][‘labels’].cpu().numpy()
masks = (predictions[0][‘masks’] > 0.5).cpu().numpy()

valid_indices = scores >= score_threshold
boxes = boxes[valid_indices]
scores = scores[valid_indices]
labels_pred = labels_pred[valid_indices]
masks = masks[valid_indices]

colors = get_random_colors(len(boxes))# 各オブジェクトの色をランダムに決定

for i, box in enumerate(boxes):
    x1, y1, x2, y2 = map(int, box)
    color = colors[i]
    label = labels_pred[i]
    score = scores[i]

    # バウンディングボックスとラベル描画
    cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
    label_text = f"{labels[label]}: {score:.2f}"
    cv2.putText(frame, label_text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

    """
    # マスク描画
    mask = masks[i, 0]  # マスクを2次元に変換
    colored_mask = np.zeros_like(frame, dtype=np.uint8)
    colored_mask[mask > 0] = color
    for c in range(3):  # 各カラー チャネルに色を適用
        colored_mask[:, :, c] = mask * color[c]
        
    frame = cv2.addWeighted(frame, 1.0, colored_mask, 0.5, 0)
   
    """
    # マスク描画
    mask = masks[i, 0]  # マスクを2次元に変換
    mask = mask.astype(np.uint8)  # データ型をuint8に変換
    mask = cv2.resize(mask, (x2 - x1, y2 - y1))  # バウンディングボックスのサイズにリサイズ
    mask = (mask > 0.5).astype(np.uint8)  # 再度バイナリマスクに変換
   # 元の画像上にマスクを適用
    roi = frame[y1:y2, x1:x2]
    colored_mask = np.zeros_like(roi, dtype=np.uint8)
    for c in range(3):
        colored_mask[:, :, c] = mask * color[c]
    frame[y1:y2, x1:x2] = cv2.addWeighted(roi, 1.0, colored_mask, 0.5, 0)
    
    
return frame

AastaLLL · January 23, 2025, 5:49am

Hi,

Do you mean this:

backbone = resnet_fpn_backbone('resnet101', weights=ResNet101_Weights.IMAGENET1K_V1)  # ResNet101バックボーン、事前学習済み重み（ImageNet）使用
model = MaskRCNN(backbone, num_classes=num_classes)          # Mask R-CNN作成

You might need this to define the model architecture.

It looks like you are using PyTorch for inference.
The usage of PyTorch is similar between dGPU and Jetson.
Maybe you can try to search for some MaskRCNN tutorial for reference.
(sorry that we cannot share third-party links here)

Thanks.

sh1n0.ytaku · January 24, 2025, 8:39am

Hello.

I finally initialized the model with the code below and succeeded in inference.

sh1n0.ytaku · January 24, 2025, 8:41am

However, the FPS is still only about 1.2, which is not fast.

I would like to convert it to the TensorRT engine for faster inference. How can I convert the Mask R-CNN model file epoch_13.pth to the TensorRT engine?

I have successfully converted the Pytorch format “epoch_13.pth” to ONNX format, but an error occurs when I next try to convert the ONNX format to the TensorRT engine.
I will share the code I am running to convert it, as well as the error message.

import tensorrt as trt

def build_engine(onnx_file_path, engine_file_path):
# Loggerの作成
logger = trt.Logger(trt.Logger.WARNING)

# BuilderとBuilderConfigの作成
builder = trt.Builder(logger)
config = builder.create_builder_config()

# ワークスペースサイズの設定
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30)  # 1GB

# FP16モードを有効化（ハードウェアが対応している場合）
if builder.platform_has_fast_fp16:
    config.set_flag(trt.BuilderFlag.FP16)

# ネットワークの作成
network = builder.create_network(1)  # 明示的バッチモード

# ONNXパーサーの作成
parser = trt.OnnxParser(network, logger)

# ONNXモデルを読み込む
with open(onnx_file_path, "rb") as model_file:
    if not parser.parse(model_file.read()):
        print("ONNXファイルのパースに失敗しました。")
        for error in range(parser.num_errors):
            print(parser.get_error(error))
        return None

# TensorRTエンジンをビルド
engine = builder.build_engine(network, config)
if engine is None:
    print("エンジンのビルドに失敗しました。")
    return None

# エンジンを保存
with open(engine_file_path, "wb") as engine_file:
    engine_file.write(engine.serialize())

print("TensorRTエンジンが正常に生成されました。")
return engine

ONNXモデルと出力エンジンファイルのパス

#onnx_file_path = “model2.onnx”
onnx_file_path = “model_static_945x740.onnx”
#onnx_file_path = “static_model_fixed.onnx”
engine_file_path = “model.trt”

エンジンをビルド

engine = build_engine(onnx_file_path, engine_file_path)

andolab@andolab:~$ python3 onnx_to_trt_new.py
[01/24/2025-17:35:25] [TRT] [E] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: ROIAlign_TRT, version: 2, namespace:.)
ONNXファイルのパースに失敗しました。
In node 1592 with name: /roi_heads/box_roi_pool/RoiAlign and operator: RoiAlign (importRoiAlign): UNSUPPORTED_NODE: Assertion failed: plugin != nullptr: ROIAlign plugin was not found in the plugin registry!

sh1n0.ytaku · January 26, 2025, 3:13am

Is it possible to convert Mask-RCNN to the TensorRT engine?
Currently I am using MMDetection for Mask R-CNN inference.

sh1n0.ytaku · January 26, 2025, 3:16am

Converting from Pytorch to TensorRT engine or converting from ONNX to TensorRT, which is simpler and easier?

sh1n0.ytaku · January 27, 2025, 5:23am

(mmdet_env) andolab@andolab:~/mmdet_env$ trtexec --onnx=/home/andolab/mmdet_env/model.onnx --saveEngine=/home/andolab/mmdet_env/model.trt --fp16
&&&& RUNNING TensorRT.trtexec [TensorRT v100300] # trtexec --onnx=/home/andolab/mmdet_env/model.onnx --saveEngine=/home/andolab/mmdet_env/model.trt --fp16
[01/27/2025-00:11:28] [I] === Model Options ===
[01/27/2025-00:11:28] [I] Format: ONNX
[01/27/2025-00:11:28] [I] Model: /home/andolab/mmdet_env/model.onnx
[01/27/2025-00:11:28] [I] Output:
[01/27/2025-00:11:28] [I] === Build Options ===
[01/27/2025-00:11:28] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/27/2025-00:11:28] [I] avgTiming: 8
[01/27/2025-00:11:28] [I] Precision: FP32+FP16
[01/27/2025-00:11:28] [I] LayerPrecisions:
[01/27/2025-00:11:28] [I] Layer Device Types:
[01/27/2025-00:11:28] [I] Calibration:
[01/27/2025-00:11:28] [I] Refit: Disabled
[01/27/2025-00:11:28] [I] Strip weights: Disabled
[01/27/2025-00:11:28] [I] Version Compatible: Disabled
[01/27/2025-00:11:28] [I] ONNX Plugin InstanceNorm: Disabled
[01/27/2025-00:11:28] [I] TensorRT runtime: full
[01/27/2025-00:11:28] [I] Lean DLL Path:
[01/27/2025-00:11:28] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/27/2025-00:11:28] [I] Exclude Lean Runtime: Disabled
[01/27/2025-00:11:28] [I] Sparsity: Disabled
[01/27/2025-00:11:28] [I] Safe mode: Disabled
[01/27/2025-00:11:28] [I] Build DLA standalone loadable: Disabled
[01/27/2025-00:11:28] [I] Allow GPU fallback for DLA: Disabled
[01/27/2025-00:11:28] [I] DirectIO mode: Disabled
[01/27/2025-00:11:28] [I] Restricted mode: Disabled
[01/27/2025-00:11:28] [I] Skip inference: Disabled
[01/27/2025-00:11:28] [I] Save engine: /home/andolab/mmdet_env/model.trt
[01/27/2025-00:11:28] [I] Load engine:
[01/27/2025-00:11:28] [I] Profiling verbosity: 0
[01/27/2025-00:11:28] [I] Tactic sources: Using default tactic sources
[01/27/2025-00:11:28] [I] timingCacheMode: local
[01/27/2025-00:11:28] [I] timingCacheFile:
[01/27/2025-00:11:28] [I] Enable Compilation Cache: Enabled
[01/27/2025-00:11:28] [I] errorOnTimingCacheMiss: Disabled
[01/27/2025-00:11:28] [I] Preview Features: Use default preview flags.
[01/27/2025-00:11:28] [I] MaxAuxStreams: -1
[01/27/2025-00:11:28] [I] BuilderOptimizationLevel: -1
[01/27/2025-00:11:28] [I] Calibration Profile Index: 0
[01/27/2025-00:11:28] [I] Weight Streaming: Disabled
[01/27/2025-00:11:28] [I] Runtime Platform: Same As Build
[01/27/2025-00:11:28] [I] Debug Tensors:
[01/27/2025-00:11:28] [I] Input(s)s format: fp32:CHW
[01/27/2025-00:11:28] [I] Output(s)s format: fp32:CHW
[01/27/2025-00:11:28] [I] Input build shapes: model
[01/27/2025-00:11:28] [I] Input calibration shapes: model
[01/27/2025-00:11:28] [I] === System Options ===
[01/27/2025-00:11:28] [I] Device: 0
[01/27/2025-00:11:28] [I] DLACore:
[01/27/2025-00:11:28] [I] Plugins:
[01/27/2025-00:11:28] [I] setPluginsToSerialize:
[01/27/2025-00:11:28] [I] dynamicPlugins:
[01/27/2025-00:11:28] [I] ignoreParsedPluginLibs: 0
[01/27/2025-00:11:28] [I]
[01/27/2025-00:11:28] [I] === Inference Options ===
[01/27/2025-00:11:28] [I] Batch: Explicit
[01/27/2025-00:11:28] [I] Input inference shapes: model
[01/27/2025-00:11:28] [I] Iterations: 10
[01/27/2025-00:11:28] [I] Duration: 3s (+ 200ms warm up)
[01/27/2025-00:11:28] [I] Sleep time: 0ms
[01/27/2025-00:11:28] [I] Idle time: 0ms
[01/27/2025-00:11:28] [I] Inference Streams: 1
[01/27/2025-00:11:28] [I] ExposeDMA: Disabled
[01/27/2025-00:11:28] [I] Data transfers: Enabled
[01/27/2025-00:11:28] [I] Spin-wait: Disabled
[01/27/2025-00:11:28] [I] Multithreading: Disabled
[01/27/2025-00:11:28] [I] CUDA Graph: Disabled
[01/27/2025-00:11:28] [I] Separate profiling: Disabled
[01/27/2025-00:11:28] [I] Time Deserialize: Disabled
[01/27/2025-00:11:28] [I] Time Refit: Disabled
[01/27/2025-00:11:28] [I] NVTX verbosity: 0
[01/27/2025-00:11:28] [I] Persistent Cache Ratio: 0
[01/27/2025-00:11:28] [I] Optimization Profile Index: 0
[01/27/2025-00:11:28] [I] Weight Streaming Budget: 100.000000%
[01/27/2025-00:11:28] [I] Inputs:
[01/27/2025-00:11:28] [I] Debug Tensor Save Destinations:
[01/27/2025-00:11:28] [I] === Reporting Options ===
[01/27/2025-00:11:28] [I] Verbose: Disabled
[01/27/2025-00:11:28] [I] Averages: 10 inferences
[01/27/2025-00:11:28] [I] Percentiles: 90,95,99
[01/27/2025-00:11:28] [I] Dump refittable layers:Disabled
[01/27/2025-00:11:28] [I] Dump output: Disabled
[01/27/2025-00:11:28] [I] Profile: Disabled
[01/27/2025-00:11:28] [I] Export timing to JSON file:
[01/27/2025-00:11:28] [I] Export output to JSON file:
[01/27/2025-00:11:28] [I] Export profile to JSON file:
[01/27/2025-00:11:28] [I]
[01/27/2025-00:11:28] [I] === Device Information ===
[01/27/2025-00:11:28] [I] Available Devices:
[01/27/2025-00:11:28] [I] Device 0: “Orin” UUID: GPU-efa7eec7-56ef-5dc0-9ee6-8cbd3607653f
[01/27/2025-00:11:28] [I] Selected Device: Orin
[01/27/2025-00:11:28] [I] Selected Device ID: 0
[01/27/2025-00:11:28] [I] Selected Device UUID: GPU-efa7eec7-56ef-5dc0-9ee6-8cbd3607653f
[01/27/2025-00:11:28] [I] Compute Capability: 8.7
[01/27/2025-00:11:28] [I] SMs: 4
[01/27/2025-00:11:28] [I] Device Global Memory: 15655 MiB
[01/27/2025-00:11:28] [I] Shared Memory per SM: 164 KiB
[01/27/2025-00:11:28] [I] Memory Bus Width: 256 bits (ECC disabled)
[01/27/2025-00:11:28] [I] Application Compute Clock Rate: 0.918 GHz
[01/27/2025-00:11:28] [I] Application Memory Clock Rate: 0.612 GHz
[01/27/2025-00:11:28] [I]
[01/27/2025-00:11:28] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/27/2025-00:11:28] [I]
[01/27/2025-00:11:28] [I] TensorRT version: 10.3.0
[01/27/2025-00:11:28] [I] Loading standard plugins
[01/27/2025-00:11:28] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 31, GPU 5506 (MiB)
[01/27/2025-00:11:32] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +928, GPU +751, now: CPU 1002, GPU 6302 (MiB)
[01/27/2025-00:11:32] [I] Start parsing network model.
[01/27/2025-00:11:32] [I] [TRT] ----------------------------------------------------------------
[01/27/2025-00:11:32] [I] [TRT] Input filename: /home/andolab/mmdet_env/model.onnx
[01/27/2025-00:11:32] [I] [TRT] ONNX IR version: 0.0.6
[01/27/2025-00:11:32] [I] [TRT] Opset version: 11
[01/27/2025-00:11:32] [I] [TRT] Producer name: pytorch
[01/27/2025-00:11:32] [I] [TRT] Producer version: 2.3.0
[01/27/2025-00:11:32] [I] [TRT] Domain:
[01/27/2025-00:11:32] [I] [TRT] Model version: 0
[01/27/2025-00:11:32] [I] [TRT] Doc string:
[01/27/2025-00:11:32] [I] [TRT] ----------------------------------------------------------------
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: GatherTopk. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: GatherTopk, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: TRTBatchedNMS. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: TRTBatchedNMS, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: MMCVMultiLevelRoiAlign. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: MMCVMultiLevelRoiAlign, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: TRTBatchedNMS. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: TRTBatchedNMS, version: 1, namespace:.)
[01/27/2025-00:11:32] [I] [TRT] No checker registered for op: MMCVMultiLevelRoiAlign. Attempting to check as plugin.
[01/27/2025-00:11:32] [E] [TRT] IPluginRegistry::getCreator: Error Code 4: API Usage Error (Cannot find plugin: MMCVMultiLevelRoiAlign, version: 1, namespace:.)
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 545 [GatherTopk → “/GatherTopk_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_21_output_0”
input: “/TopK_output_1”
output: “/GatherTopk_output_0”
name: “/GatherTopk”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 554 [GatherTopk → “/GatherTopk_1_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_19_output_0”
input: “/TopK_output_1”
output: “/GatherTopk_1_output_0”
name: “/GatherTopk_1”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 610 [GatherTopk → “/GatherTopk_2_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_30_output_0”
input: “/TopK_1_output_1”
output: “/GatherTopk_2_output_0”
name: “/GatherTopk_2”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 619 [GatherTopk → “/GatherTopk_3_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_28_output_0”
input: “/TopK_1_output_1”
output: “/GatherTopk_3_output_0”
name: “/GatherTopk_3”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 673 [GatherTopk → “/GatherTopk_4_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_39_output_0”
input: “/TopK_2_output_1”
output: “/GatherTopk_4_output_0”
name: “/GatherTopk_4”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 682 [GatherTopk → “/GatherTopk_5_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_37_output_0”
input: “/TopK_2_output_1”
output: “/GatherTopk_5_output_0”
name: “/GatherTopk_5”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 736 [GatherTopk → “/GatherTopk_6_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_48_output_0”
input: “/TopK_3_output_1”
output: “/GatherTopk_6_output_0”
name: “/GatherTopk_6”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 745 [GatherTopk → “/GatherTopk_7_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_46_output_0”
input: “/TopK_3_output_1”
output: “/GatherTopk_7_output_0”
name: “/GatherTopk_7”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 799 [GatherTopk → “/GatherTopk_8_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_57_output_0”
input: “/TopK_4_output_1”
output: “/GatherTopk_8_output_0”
name: “/GatherTopk_8”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 808 [GatherTopk → “/GatherTopk_9_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Concat_55_output_0”
input: “/TopK_4_output_1”
output: “/GatherTopk_9_output_0”
name: “/GatherTopk_9”
op_type: “GatherTopk”
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 891 [TRTBatchedNMS → “/TRTBatchedNMS_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Unsqueeze_95_output_0”
input: “/Concat_64_output_0”
output: “/TRTBatchedNMS_output_0”
output: “/TRTBatchedNMS_output_1”
name: “/TRTBatchedNMS”
op_type: “TRTBatchedNMS”
attribute {
name: “background_label_id”
i: -1
type: INT
}
attribute {
name: “clip_boxes”
i: 0
type: INT
}
attribute {
name: “iou_threshold”
f: 0.7
type: FLOAT
}
attribute {
name: “is_normalized”
i: 0
type: INT
}
attribute {
name: “keep_topk”
i: 1000
type: INT
}
attribute {
name: “num_classes”
i: 1
type: INT
}
attribute {
name: “return_index”
i: 0
type: INT
}
attribute {
name: “score_threshold”
f: 0.05
type: FLOAT
}
attribute {
name: “topk”
i: 5000
type: INT
}
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 928 [MMCVMultiLevelRoiAlign → “/bbox_roi_extractor/MMCVMultiLevelRoiAlign_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Reshape_50_output_0”
input: “/neck/fpn_convs.0/conv/Conv_output_0”
input: “/neck/fpn_convs.1/conv/Conv_output_0”
input: “/neck/fpn_convs.2/conv/Conv_output_0”
input: “/neck/fpn_convs.3/conv/Conv_output_0”
output: “/bbox_roi_extractor/MMCVMultiLevelRoiAlign_output_0”
name: “/bbox_roi_extractor/MMCVMultiLevelRoiAlign”
op_type: “MMCVMultiLevelRoiAlign”
attribute {
name: “aligned”
i: 1
type: INT
}
attribute {
name: “featmap_strides”
floats: 4
floats: 8
floats: 16
floats: 32
type: FLOATS
}
attribute {
name: “finest_scale”
i: 56
type: INT
}
attribute {
name: “output_height”
i: 7
type: INT
}
attribute {
name: “output_width”
i: 7
type: INT
}
attribute {
name: “pool_mode”
i: 1
type: INT
}
attribute {
name: “roi_scale_factor”
f: 1
type: FLOAT
}
attribute {
name: “sampling_ratio”
i: 0
type: INT
}
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 1030 [TRTBatchedNMS → “/TRTBatchedNMS_1_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Unsqueeze_116_output_0”
input: “/Slice_14_output_0”
output: “/TRTBatchedNMS_1_output_0”
output: “/TRTBatchedNMS_1_output_1”
name: “/TRTBatchedNMS_1”
op_type: “TRTBatchedNMS”
attribute {
name: “background_label_id”
i: -1
type: INT
}
attribute {
name: “clip_boxes”
i: 0
type: INT
}
attribute {
name: “iou_threshold”
f: 0.5
type: FLOAT
}
attribute {
name: “is_normalized”
i: 0
type: INT
}
attribute {
name: “keep_topk”
i: 100
type: INT
}
attribute {
name: “num_classes”
i: 6
type: INT
}
attribute {
name: “return_index”
i: 0
type: INT
}
attribute {
name: “score_threshold”
f: 0.05
type: FLOAT
}
attribute {
name: “topk”
i: 5000
type: INT
}
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:948: While parsing node number 1064 [MMCVMultiLevelRoiAlign → “/mask_roi_extractor/MMCVMultiLevelRoiAlign_output_0”]:
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:950: — Begin node —
input: “/Reshape_61_output_0”
input: “/neck/fpn_convs.0/conv/Conv_output_0”
input: “/neck/fpn_convs.1/conv/Conv_output_0”
input: “/neck/fpn_convs.2/conv/Conv_output_0”
input: “/neck/fpn_convs.3/conv/Conv_output_0”
output: “/mask_roi_extractor/MMCVMultiLevelRoiAlign_output_0”
name: “/mask_roi_extractor/MMCVMultiLevelRoiAlign”
op_type: “MMCVMultiLevelRoiAlign”
attribute {
name: “aligned”
i: 1
type: INT
}
attribute {
name: “featmap_strides”
floats: 4
floats: 8
floats: 16
floats: 32
type: FLOATS
}
attribute {
name: “finest_scale”
i: 56
type: INT
}
attribute {
name: “output_height”
i: 14
type: INT
}
attribute {
name: “output_width”
i: 14
type: INT
}
attribute {
name: “pool_mode”
i: 1
type: INT
}
attribute {
name: “roi_scale_factor”
f: 1
type: FLOAT
}
attribute {
name: “sampling_ratio”
i: 0
type: INT
}
domain: “mmdeploy”

[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:951: — End node —
[01/27/2025-00:11:32] [E] [TRT] ModelImporter.cpp:953: ERROR: onnxOpCheckers.cpp:780 In function checkFallbackPluginImporter:
[6] creator && “Plugin not found, are the plugin name, version, and namespace correct?”
[01/27/2025-00:11:32] [E] Failed to parse onnx file
[01/27/2025-00:11:32] [I] Finished parsing network model. Parse time: 0.434691
[01/27/2025-00:11:32] [E] Parsing model failed
[01/27/2025-00:11:32] [E] Failed to create engine from model or file.
[01/27/2025-00:11:32] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100300] # trtexec --onnx=/home/andolab/mmdet_env/model.onnx --saveEngine=/home/andolab/mmdet_env/model.trt --fp16

When I try to convert to the TensorRT engine, I get this error. However, I can build the TensorRT plugin from Git and set it in the environment variables. Is this a problem specific to MMDetection? Or is it a flaw in the TensorRT plugin?

(mmdet_env) andolab@andolab:~/mmdet_env$ ls /usr/lib/aarch64-linux-gnu | grep libnvinfer_plugin
libnvinfer_plugin.so
libnvinfer_plugin.so.10
libnvinfer_plugin.so.10.3.0
libnvinfer_plugin_static.a

(mmdet_env) andolab@andolab:~/mmdet_env$ echo $LD_LIBRARY_PATH
/usr/local/cuda-12.6/lib64::/usr/lib/aarch64-linux-gnu

Topic		Replies	Views
Jetpack 35.1 Orin : ERROR: cuda failure (no CUDA-capable device is detected) Jetson AGX Orin cuda	5	1312	August 29, 2022
Jetson AGX Orin test JetPack componts CUDNN ERROR Jetson AGX Orin cuda	7	680	September 21, 2022
A build failure after upgrading to JetPack 4.4 from JetPack 4.4 DP Jetson AGX Xavier cuda , nvbugs	5	679	October 18, 2021
Nvidia A5000 and CUDA 10.2 without AVX Linux cuda , tensorflow , kernel , ubuntu	19	1991	March 29, 2023
Failed cuDNN installation test (./mnistCUDNN) cuDNN cudnn	2	1691	October 12, 2021
CUDNN_STATUS_NOT_SUPPORTED for algo 6 and 3 cuDNN cudnn	1	1158	January 14, 2021
Cannot run yolo on jetson agx orin Jetson AGX Orin pytorch , cudnn	10	218	October 28, 2024
Jetpack 4.4 Broke one of my programs Jetson Nano cudnn	24	3126	October 18, 2021
Simple Audio Recognition on Jetson Nano Jetson Nano	6	4593	October 18, 2021
Fail to initialize CUDNN when running tensorflow: CUDNN_STATUS_INTERNAL_ERROR Jetson AGX Xavier tensorflow , cudnn	7	2841	October 18, 2021

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

ResNet101バックボーンを使って、FPN付きのMask R-CNNモデルを作成

GPUが利用可能な場合にGPUを使用する

cuDNN 初期化エラー対策

モデルのチェックポイントをロード

独自のクラスラベル

GStreamerパイプライン設定

画像変換処理

カラーマップを生成

推論結果を描画する関数

メインループ

終了処理

モデルのロード (ResNet101バックボーン)

カラーマップ生成

推論結果の描画

ONNXモデルと出力エンジンファイルのパス

エンジンをビルド

Related topics