Jetson Nano: Parsed Tiny Yolo v2 ONNX model gives different result in TRT

Hi,

Device: Jetson Nano
Jetpack version: JP4.4
CUDA: 10.2

  1. I have a standard custom trained tiny yolo v2 ONNX object detection model from Azure Custom Vision. I could successfully parse the model and convert to trt engine using tensorrt API. But the inference results does not make sense. Whereas the same result gave correct results with ONNXRuntime (CPU Version). Maybe I am making some mistakes in YOLO Masks and Anchors to adapt to the custom model. I tried many possibilities but couldn’t make it work. Could you please help me with this and also the next question.

  2. This is not an issue of nvidia’s APIs. But I am just querying for all sources of information I can get. Tried to build and install ONNXRuntime-gpu-tensorrt in Nano from source following the below link
    [https://github.com/microsoft/onnxruntime/issues/2684#issuecomment-568548387]
    , but failed with an error when cmake is trying to build cudnn_rnn_base.cc.o. Thanks

Regards,
MK

Hi,

1.
May I know which sample do you use?

/usr/src/tensorrt/samples/python/yolov3_onnx/

Or

/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/

If you are using Deepstream sample, please follow this document for a customized model:

2.
May I know the detail error log you met?
It looks like the link you shared is solved.

Thanks.

Hi AastaLLL,

Thanks for the quick response.

  1. Yes. I am using the sample from “/usr/src/tensorrt/samples/python/yolov3_onnx/”

  2. Please find below the error log. It always fails when building this particular object
    cudnn_rnn_base.cc.o. The link given is just an instruction to build. But it failed. Maybe I will raise this as a separate GitHub issue in onnxruntime’s repo.

[ 51%] Building CXX object CMakeFiles/onnxruntime_providers_cuda.dir/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc.o
In file included from /usr/local/onnxruntime/include/onnxruntime/core/framework/tensor.h:12:0,
from /usr/local/onnxruntime/onnxruntime/core/framework/data_transfer.h:7,
from /usr/local/onnxruntime/onnxruntime/core/framework/data_transfer_manager.h:7,
from /usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:7,
from /usr/local/onnxruntime/onnxruntime/core/providers/cuda/cudnn_common.h:5,
from /usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:7,
from /usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc:4:
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h: In member function ‘onnxruntime::common::Status onnxruntime::cuda::CudnnRNN::Set(cudnnContext* const&, int64_t, int, cudnnDropoutDescriptor_t, cudnnDirectionMode_t, cudnnRNNMode_t, cudnnDataType_t)’:
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:45:27: error: ‘cudnnSetRNNDescriptor’ was not declared in this scope
CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
^
/usr/local/onnxruntime/include/onnxruntime/core/common/common.h:157:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
auto _status = (expr);
^~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:40:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
ORT_RETURN_IF_ERROR(CUDNN_CALL(expr)
^~~~~~~~~~~~~~~~~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:40:23: note: in expansion of macro ‘CUDNN_CALL’
ORT_RETURN_IF_ERROR(CUDNN_CALL(expr)
^~~~~~~~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:45:5: note: in expansion of macro ‘CUDNN_RETURN_IF_ERROR’
CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
^
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:45:27: note: suggested alternative: ‘cudnnSetLRNDescriptor’
CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
^
/usr/local/onnxruntime/include/onnxruntime/core/common/common.h:157:21: note: in definition of macro ‘ORT_RETURN_IF_ERROR_SESSIONID’
auto _status = (expr);
^~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:40:3: note: in expansion of macro ‘ORT_RETURN_IF_ERROR’
ORT_RETURN_IF_ERROR(CUDNN_CALL(expr)
^~~~~~~~~~~~~~~~~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/cuda_common.h:40:23: note: in expansion of macro ‘CUDNN_CALL’
ORT_RETURN_IF_ERROR(CUDNN_CALL(expr)
^~~~~~~~~~
/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h:45:5: note: in expansion of macro ‘CUDNN_RETURN_IF_ERROR’
CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
^
CMakeFiles/onnxruntime_providers_cuda.dir/build.make:517: recipe for target ‘CMakeFiles/onnxruntime_providers_cuda.dir/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc.o’ failed
make[2]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/usr/local/onnxruntime/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc.o] Error 1
CMakeFiles/Makefile2:718: recipe for target ‘CMakeFiles/onnxruntime_providers_cuda.dir/all’ failed
make[1]: *** [CMakeFiles/onnxruntime_providers_cuda.dir/all] Error 2
Makefile:162: recipe for target ‘all’ failed
make: *** [all] Error 2
Traceback (most recent call last):
File “/usr/local/onnxruntime/tools/ci_build/build.py”, line 1731, in
sys.exit(main())
File “/usr/local/onnxruntime/tools/ci_build/build.py”, line 1627, in main
build_targets(args, cmake_path, build_dir, configs, args.parallel)
File “/usr/local/onnxruntime/tools/ci_build/build.py”, line 853, in build_targets
run_subprocess(cmd_args, env=env)
File “/usr/local/onnxruntime/tools/ci_build/build.py”, line 392, in run_subprocess
env=my_env, shell=shell)
File “/usr/lib/python3.6/subprocess.py”, line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command ‘[’/usr/local/bin/cmake’, ‘–build’, ‘/usr/local/onnxruntime/build/Linux/Release’, ‘–config’, ‘Release’]’ returned non-zero exit status 2.

Hi,

Sorry for the late update.

1. To give a further suggestion, would you mind to share a simple reproducible source with us?
It will be good if it can include the model, sample for both TensorRT and CPU so we can compare the result directly.

2. We are going to checking onnxruntime on JetPack4.4.
Will share more information with you later.

Thanks.

Hi,

We can build this onnxruntime issue with this update:

diff --git a/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h b/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h
index 5281904a2..75131db39 100644
--- a/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h
+++ b/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.h
@@ -42,16 +42,16 @@ class CudnnRNN {
     if (!cudnn_rnn_desc_)
       CUDNN_RETURN_IF_ERROR(cudnnCreateRNNDescriptor(&cudnn_rnn_desc_));
 
-    CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor(cudnnHandle,
-                                                cudnn_rnn_desc_,
-                                                gsl::narrow_cast<int>(hidden_size),
-                                                num_layers,
-                                                cudnn_dropout_desc,
-                                                CUDNN_LINEAR_INPUT,  // We can also skip the input matrix transformation
-                                                cudnn_direction_model,
-                                                rnn_mode,
-                                                CUDNN_RNN_ALGO_STANDARD,  //CUDNN_RNN_ALGO_PERSIST_STATIC, CUDNN_RNN_ALGO_PERSIST_DYNAMIC
-                                                dataType));
+    CUDNN_RETURN_IF_ERROR(cudnnSetRNNDescriptor_v6(cudnnHandle,
+                                                   cudnn_rnn_desc_,
+                                                   gsl::narrow_cast<int>(hidden_size),
+                                                   num_layers,
+                                                   cudnn_dropout_desc,
+                                                   CUDNN_LINEAR_INPUT,  // We can also skip the input matrix transformation
+                                                   cudnn_direction_model,
+                                                   rnn_mode,
+                                                   CUDNN_RNN_ALGO_STANDARD,  //CUDNN_RNN_ALGO_PERSIST_STATIC, CUDNN_RNN_ALGO_PERSIST_DYNAMIC
+                                                   dataType));
 
     if (prop.major >= 7 && dataType == CUDNN_DATA_HALF) {
       cudnnSetRNNMatrixMathType(cudnn_rnn_desc_, CUDNN_TENSOR_OP_MATH);

Thanks.

Hi @AastaLLL,

Thank you very much for your response.

  1. Regarding issue 1, I will send you the model and my sample in sometime.
  2. Thanks for your update. We are now using TF-TRT method to suit our needs as of now. We will try building and using ONNXRuntime in sometime and let you know the result .

Regards,
MT