How to generate a tensorrt model that is supported by Deesptream sdk

I am trying to run my custom model with deepstream sdk on jetson nano but getting failed to allocate output buffers as the model have dynamic output shapes, could you please help to generate the tensorrt model that will support deepstream. For your reference i am attaching deepstream forum issue link.

How to run custom detection and segmentation models - #10 by anil.kumarp0255

@anil.kumarp0255
Where did your onnx file come from? Is it trained by TAO? If yes, could you try to use GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream to run? You can set onnx-file=your.onnx in deepstream_tao_apps/configs/nvinfer/yolov3_tao/pgie_yolov3_tao_config.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.

Hi Morganh, the onnx model is not generated from TAO its generated from pytorch .

@anil.kumarp0255
Seems that it is a custom onnx file. Can you check if it can be generated to tensort engine by trtexec? Also, can you share the onnx file and where did you get it? Is there any github?

Yes its custom onnx-model only it is able to generate the tensorrt engine but its having dynamic output shapes, it can run with normal tensorrt script and get inference also but when i try to integrate with deepstream its getting failed to allocate output buffers, if tensorrt engine have static output shapes we can integrate with deepstream and allocate output buffers succesfully. The output shapes of the engine that i am using is a below.

INFO: [Implicit Engine Info]: layers num: 6
0 INPUT kFLOAT images 3x480x640
1 OUTPUT kINT32 valid 0
2 OUTPUT kFLOAT rois 4
3 OUTPUT kFLOAT scores 0
4 OUTPUT kINT32 class_ids 0
5 OUTPUT kFLOAT masks 120x160

Can you share the onnx file and where did you get it? Is there any github?

Its not from github, its a custom model trained by our team, for some reasons i cannot share the model so sorry morganh.

No problem. If you are able to generate the tensorrt engine and run it, but cannot generate tensorrt engine with deepstream and run with deepstream, seems that it is a feature request for deepstream.

In deepstream tensorrt model is able to generate but unable to load and allocate buffer if the model have dynamic output shapes or having 0 shape.

To double check, can you run your “normal tensorrt script” against the tensorrt engine which is generated by deepsteam?

The engine that is generated by Deepstream can allocate buffers normally and run the inference successfully with normal tensorrt script

OK, so seems that it is a feature request for deepstream to handle this kind of onnx file.
Can you run below to share the info? Thanks.
$ python -m pip install colored
$ python -m pip install polygraphy --index-url https://pypi.ngc.nvidia.com
$ polygraphy inspect model your.onnx
$ polygraphy inspect model your.engine

Here is the ONNX model info

[I] Loading model: /home/nvidia/yolov8-trt-test/Default-medication_box_Mask_Default/Instance_Segmentation5_meta_Tool1/Models/Model_2023_09_25/model_final.onnx
[I] ==== ONNX Model ====
Name: torch-jit-export | ONNX Opset: 12

---- 1 Graph Input(s) ----
{images [dtype=float32, shape=(1, 3, 480, 640)]}

---- 5 Graph Output(s) ----
{valid [dtype=int64, shape=('Castvalid_dim_0',)],
 rois [dtype=float32, shape=('Gatherrois_dim_0', 4)],
 scores [dtype=float32, shape=('Gatherscores_dim_0',)],
 class_ids [dtype=int64, shape=('Gatherclass_ids_dim_0',)],
 masks [dtype=float32, shape=('Reshapemasks_dim_0', 120, 160)]}

---- 219 Initializer(s) ----

---- 376 Node(s) ----

Here is the tensorrt model info

[I] Loading bytes from /home/nvidia/yolov8-trt-test/Default-medication_box_Mask_Default/Instance_Segmentation5_meta_Tool1/Models/Model_2023_09_25/model_final.onnx_b1_gpu0_fp32.engine
[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine

---- 1 Engine Input(s) ----
{images [dtype=float32, shape=(1, 3, 480, 640)]}

---- 5 Engine Output(s) ----
{valid [dtype=int32, shape=(-1,)],
 rois [dtype=float32, shape=(-1, 4)],
 scores [dtype=float32, shape=(-1,)],
 class_ids [dtype=int32, shape=(-1,)],
 masks [dtype=float32, shape=(-1, 120, 160)]}

---- Memory ----
Device Memory: 49400320 bytes

---- 1 Profile(s) (6 Tensor(s) Each) ----
- Profile: 0
    Tensor: images             (Input), Index: 0 | Shapes: min=(1, 3, 480, 640), opt=(1, 3, 480, 640), max=(1, 3, 480, 640)
    Tensor: valid             (Output), Index: 1 | Shape: (-1,)
    Tensor: rois              (Output), Index: 2 | Shape: (-1, 4)
    Tensor: scores            (Output), Index: 3 | Shape: (-1,)
    Tensor: class_ids         (Output), Index: 4 | Shape: (-1,)
    Tensor: masks             (Output), Index: 5 | Shape: (-1, 120, 160)

---- 415 Layer(s) ----

Thanks for the info. Will let deepstream team know.

For workaround, could you modify your onnx file to

 {valid [dtype=int64, shape=('Castvalid_dim_0', 1)],
 rois [dtype=float32, shape=('Gatherrois_dim_0', 4)],
 scores [dtype=float32, shape=('Gatherscores_dim_0', 1)],
 class_ids [dtype=int64, shape=('Gatherclass_ids_dim_0', 1)],
 masks [dtype=float32, shape=('Reshapemasks_dim_0', 120, 160)]}

Hi, Morganh i modified the onnx file like you mentioned its working fine with deepstream but some times its getting below error.

GObject.threads_init()
Creating Pipeline

Creating streamux

Creating source_bin 0

Creating source bin
source-bin-00
Creating Pgie

Creating tiler

Creating nvvidconv

Creating nvosd

Creating transform

Creating EGLSink

Atleast one of the sources is live
the batch size is 1
Adding elements to Pipeline

Linking elements in the Pipeline

object_detector_medicine.py:524: PyGIDeprecationWarning: GObject.MainLoop is deprecated; use GLib.MainLoop instead
loop = GObject.MainLoop()
Now playing…
1 : rtsp://solomon:solomon888@10.1.2.124:88/videoMain
Starting pipeline

Using winsys: x11
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-6.2/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is OFF
[NvTrackerParams::getConfigRoot()] !!![WARNING] Empty config file path is provided. Will go ahead with default values
[NvMultiObjectTracker] Initialized
WARNING: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.2/sources/deepstream_python_apps/apps/DeepStream-Yolo-Seg/tmp.onnx_b1_gpu0_fp32.engine open error
0:00:01.699220396 500062 0x221fa410 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-6.2/sources/deepstream_python_apps/apps/DeepStream-Yolo-Seg/tmp.onnx_b1_gpu0_fp32.engine failed
0:00:01.853530352 500062 0x221fa410 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-6.2/sources/deepstream_python_apps/apps/DeepStream-Yolo-Seg/tmp.onnx_b1_gpu0_fp32.engine failed, try rebuild
0:00:01.853590642 500062 0x221fa410 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
0:03:40.546969364 500062 0x221fa410 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-6.2/sources/deepstream_python_apps/apps/DeepStream-Yolo-Seg/tmp.onnx_b1_gpu0_fp32.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 6
0 INPUT kFLOAT images 3x480x640
1 OUTPUT kFLOAT rois 4
2 OUTPUT kFLOAT scores 1
3 OUTPUT kINT32 valid 1
4 OUTPUT kINT32 class_ids 1
5 OUTPUT kFLOAT masks 120x160

0:03:40.849459159 500062 0x221fa410 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus: [UID 1]: Load new model:config_infer_primary_yoloV8_seg_medicine.txt sucessfully
Decodebin child added: source

Decodebin child added: decodebin0

Decodebin child added: rtph264depay0

Decodebin child added: decodebin1

Decodebin child added: rtppcmudepay0

Decodebin child added: mulawdec0

In cb_newpad

gstname= audio/x-raw
Decodebin child added: h264parse0

Decodebin child added: capsfilter0

Decodebin child added: nvv4l2decoder0

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
In cb_newpad

gstname= video/x-raw
features= <Gst.CapsFeatures object at 0xffff9c8cc580 (GstCapsFeatures at 0xfffe7c07c940)>
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.574771540 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
Error: gst-stream-error-quark: Failed to queue input batch for inferencing (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstNvInfer:primary-inference
Frame Number= 0 Number of Objects= 0
Exiting app

[NvMultiObjectTracker] De-initialized
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.635136950 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.691096746 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.747603152 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.802649350 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing

Please search in deepstream forum to check if there is hint. Also @yuweiw to help check further.

@anil.kumarp0255 The log is printed from the following source code enqueueBuffer:sources\libs\nvdsinfer\nvdsinfer_backend.cpp. There may be something wrong with your model that causes the input mismatches. Could you attach all your project, including the model, config file, code that we can check?
Also please provide complete information as applicable to your setup. Thanks
Hardware Platform (Jetson / GPU)
DeepStream Version
JetPack Version (valid for Jetson only)
TensorRT Version
NVIDIA GPU Driver Version (valid for GPU only)
Issue Type( questions, new requirements, bugs)
How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)