Jetson Nano: Parsed Tiny Yolo v2 ONNX model gives different result in TRT

muralikrishnat29 · May 11, 2020, 2:04pm

Hi,

Device: Jetson Nano
Jetpack version: JP4.4
CUDA: 10.2

I have a standard custom trained tiny yolo v2 ONNX object detection model from Azure Custom Vision. I could successfully parse the model and convert to trt engine using tensorrt API. But the inference results does not make sense. Whereas the same result gave correct results with ONNXRuntime (CPU Version). Maybe I am making some mistakes in YOLO Masks and Anchors to adapt to the custom model. I tried many possibilities but couldn’t make it work. Could you please help me with this and also the next question.
This is not an issue of nvidia’s APIs. But I am just querying for all sources of information I can get. Tried to build and install ONNXRuntime-gpu-tensorrt in Nano from source following the below link
[Instructions to build for ARM 64bit · Issue #2684 · microsoft/onnxruntime · GitHub]
, but failed with an error when cmake is trying to build cudnn_rnn_base.cc.o. Thanks

Regards,
MK

AastaLLL · May 12, 2020, 6:02am

Hi,

1.
May I know which sample do you use?

/usr/src/tensorrt/samples/python/yolov3_onnx/

Or

/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/

If you are using Deepstream sample, please follow this document for a customized model:

2.
May I know the detail error log you met?
It looks like the link you shared is solved.

Thanks.

muralikrishnat29 · May 12, 2020, 7:01am

Hi AastaLLL,

Thanks for the quick response.

Yes. I am using the sample from “/usr/src/tensorrt/samples/python/yolov3_onnx/”
Please find below the error log. It always fails when building this particular object
cudnn_rnn_base.cc.o. The link given is just an instruction to build. But it failed. Maybe I will raise this as a separate GitHub issue in onnxruntime’s repo.

[ 51%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc.o
In file included from /usr/local/onnxruntime/include/onnxruntime/core/framework/tensor.h:12:0,
from /usr/local/onnxruntime/onnxruntime/core/framework/data_transfer.h:7,
from /usr/local/onnxruntime/onnxruntime/core/framework/data_transfer_manager.h:7,
from /usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:7,
from /usr/local/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.h:5,
from /usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:7,
from /usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc:4:
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h: In member function ‘onnxruntime::common::Status onnxruntime::cuda::CudnnRNN::Set(cudnnContext* const&, int64_t, int, cudnnDropoutDescriptor_t, cudnnDirectionMode_t, cudnnRNNMode_t, cudnnDataType_t)’:
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:45:27: error: ‘cudnnSetRNNDescriptor’ was not declared in this scope
CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
^
/usr/local/onnxruntime/include/onnxruntime/core/common/common.h:157:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
auto _status = (expr);
^~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:40:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
ORT_RETURN_IF_ERROR(CUDNN_CALL(expr)
^~~~~~~~~~~~~~~~~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:40:23: note: in expansion of macro ‘CUDNN_CALL’
ORT_RETURN_IF_ERROR(CUDNN_CALL(expr)
^~~~~~~~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:45:5: note: in expansion of macro ‘CUDNN_RETURN_IF_ERROR’
CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
^
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:45:27: note: suggested alternative: ‘cudnnSetLRNDescriptor’
CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
^
/usr/local/onnxruntime/include/onnxruntime/core/common/common.h:157:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
auto _status = (expr);
^~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:40:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
ORT_RETURN_IF_ERROR(CUDNN_CALL(expr)
^~~~~~~~~~~~~~~~~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:40:23: note: in expansion of macro ‘CUDNN_CALL’
ORT_RETURN_IF_ERROR(CUDNN_CALL(expr)
^~~~~~~~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:45:5: note: in expansion of macro ‘CUDNN_RETURN_IF_ERROR’
CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
^
CMakeFiles/onnxruntime_providers_cuda.dir/build.make:517: recipe for target ‘CMakeFiles/onnxruntime_providers_cuda.dir/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc.o’ failed
make[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc.o] Error 1
CMakeFiles/Makefile2:718: recipe for target ‘CMakeFiles/onnxruntime_providers_cuda.dir/all’ failed
make[1]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/all] Error 2
Makefile:162: recipe for target ‘all’ failed
make: *** [all] Error 2
Traceback (most recent call last):
File “/usr/local/onnxruntime/tools/ci_build/build.py”, line 1731, in
sys.exit(main())
File “/usr/local/onnxruntime/tools/ci_build/build.py”, line 1627, in main
build_targets(args, cmake_path, build_dir, configs, args.parallel)
File “/usr/local/onnxruntime/tools/ci_build/build.py”, line 853, in build_targets
run_subprocess(cmd_args, env=env)
File “/usr/local/onnxruntime/tools/ci_build/build.py”, line 392, in run_subprocess
env=my_env, shell=shell)
File “/usr/lib/python3.6/subprocess.py”, line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[’/usr/local/bin/cmake’, ‘–build’, ‘/usr/local/onnxruntime/build/Linux/Release’, ‘–config’, ‘Release’]’ returned non-zero exit status 2.

AastaLLL · May 25, 2020, 6:47am

Hi,

Sorry for the late update.

1. To give a further suggestion, would you mind to share a simple reproducible source with us?
It will be good if it can include the model, sample for both TensorRT and CPU so we can compare the result directly.

2. We are going to checking onnxruntime on JetPack4.4.
Will share more information with you later.

Thanks.

AastaLLL · May 26, 2020, 2:09pm

Hi,

We can build this onnxruntime issue with this update:

diff --git a/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h b/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h
index 5281904a2..75131db39 100644
--- a/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h
+++ b/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h
@@ -42,16 +42,16 @@ class CudnnRNN {
     if (!cudnn_rnn_desc_)
       CUDNN_RETURN_IF_ERROR(cudnnCreateRNNDescriptor(&cudnn_rnn_desc_));
 
-    CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
-                                                cudnn_rnn_desc_,
-                                                gsl::narrow_cast<int>(hidden_size),
-                                                num_layers,
-                                                cudnn_dropout_desc,
-                                                CUDNN_LINEAR_INPUT,  // We can also skip the input matrix transformation
-                                                cudnn_direction_model,
-                                                rnn_mode,
-                                                CUDNN_RNN_ALGO_STANDARD,  //CUDNN_RNN_ALGO_PERSIST_STATIC, CUDNN_RNN_ALGO_PERSIST_DYNAMIC
-                                                dataType));
+    CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor_v6(cudnnHandle,
+                                                   cudnn_rnn_desc_,
+                                                   gsl::narrow_cast<int>(hidden_size),
+                                                   num_layers,
+                                                   cudnn_dropout_desc,
+                                                   CUDNN_LINEAR_INPUT,  // We can also skip the input matrix transformation
+                                                   cudnn_direction_model,
+                                                   rnn_mode,
+                                                   CUDNN_RNN_ALGO_STANDARD,  //CUDNN_RNN_ALGO_PERSIST_STATIC, CUDNN_RNN_ALGO_PERSIST_DYNAMIC
+                                                   dataType));
 
     if (prop.major >= 7 && dataType == CUDNN_DATA_HALF) {
       cudnnSetRNNMatrixMathType(cudnn_rnn_desc_, CUDNN_TENSOR_OP_MATH);

Thanks.

muralikrishnat29 · May 28, 2020, 9:09am

Hi @AastaLLL,

Thank you very much for your response.

Regarding issue 1, I will send you the model and my sample in sometime.
Thanks for your update. We are now using TF-TRT method to suit our needs as of now. We will try building and using ONNXRuntime in sometime and let you know the result .

Regards,
MT

Topic		Replies	Views
Engine Plan Inference on JetsonTX2 Jetson TX2 tensorrt , python	11	1842	October 18, 2021
Onnxruntime error Jetson Nano cuda , pytorch , onnx	9	6753	October 10, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1399	July 12, 2022
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3577	April 20, 2022
ERORR with ONNX2TRT : Unknown embedded device detected Jetson Xavier NX onnx	18	4550	April 27, 2022
Libnvinfer_plugin.so.8.0.1 building issue on Nano using JetPack 4.6 TAO Toolkit	12	1696	March 31, 2022
Yolo V4 on Jetson Nano with JP4.6 Jetson Nano yolo	2	1982	May 3, 2022
Assertion `inputs[0].nbDims == 4 && inputs[0].d[1] == mNbClasses * 4' failed.' TensorRT tensorrt	7	656	May 31, 2021
Erorr with onnx to trt Jetson Xavier NX tensorrt	8	1236	March 30, 2022
How to use OnnxRuntime for Jetson Nano wirh Cuda ,TensorRT ? Jetson Nano	6	5518	October 14, 2021

Jetson Nano: Parsed Tiny Yolo v2 ONNX model gives different result in TRT

Related topics