How to generate a tensorrt model that is supported by Deesptream sdk

anil.kumarp0255 · January 25, 2024, 3:03am

I am trying to run my custom model with deepstream sdk on jetson nano but getting failed to allocate output buffers as the model have dynamic output shapes, could you please help to generate the tensorrt model that will support deepstream. For your reference i am attaching deepstream forum issue link.

How to run custom detection and segmentation models - #10 by anil.kumarp0255

Morganh · January 25, 2024, 5:01am

@anil.kumarp0255
Where did your onnx file come from? Is it trained by TAO? If yes, could you try to use GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream to run? You can set onnx-file=your.onnx in deepstream_tao_apps/configs/nvinfer/yolov3_tao/pgie_yolov3_tao_config.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.

anil.kumarp0255 · January 25, 2024, 5:33am

Hi Morganh, the onnx model is not generated from TAO its generated from pytorch .

Morganh · January 25, 2024, 6:03am

@anil.kumarp0255
Seems that it is a custom onnx file. Can you check if it can be generated to tensort engine by trtexec? Also, can you share the onnx file and where did you get it? Is there any github?

anil.kumarp0255 · January 25, 2024, 6:12am

Yes its custom onnx-model only it is able to generate the tensorrt engine but its having dynamic output shapes, it can run with normal tensorrt script and get inference also but when i try to integrate with deepstream its getting failed to allocate output buffers, if tensorrt engine have static output shapes we can integrate with deepstream and allocate output buffers succesfully. The output shapes of the engine that i am using is a below.

INFO: [Implicit Engine Info]: layers num: 6
0 INPUT kFLOAT images 3x480x640
1 OUTPUT kINT32 valid 0
2 OUTPUT kFLOAT rois 4
3 OUTPUT kFLOAT scores 0
4 OUTPUT kINT32 class_ids 0
5 OUTPUT kFLOAT masks 120x160

Morganh · January 25, 2024, 6:28am

Can you share the onnx file and where did you get it? Is there any github?

anil.kumarp0255 · January 25, 2024, 6:31am

Its not from github, its a custom model trained by our team, for some reasons i cannot share the model so sorry morganh.

Morganh · January 25, 2024, 6:34am

No problem. If you are able to generate the tensorrt engine and run it, but cannot generate tensorrt engine with deepstream and run with deepstream, seems that it is a feature request for deepstream.

anil.kumarp0255 · January 25, 2024, 6:37am

In deepstream tensorrt model is able to generate but unable to load and allocate buffer if the model have dynamic output shapes or having 0 shape.

Morganh · January 25, 2024, 6:39am

To double check, can you run your “normal tensorrt script” against the tensorrt engine which is generated by deepsteam?

anil.kumarp0255 · January 25, 2024, 6:46am

The engine that is generated by Deepstream can allocate buffers normally and run the inference successfully with normal tensorrt script

Morganh · January 25, 2024, 6:56am

OK, so seems that it is a feature request for deepstream to handle this kind of onnx file.
Can you run below to share the info? Thanks.
$ python -m pip install colored
$ python -m pip install polygraphy --index-url https://pypi.ngc.nvidia.com
$ polygraphy inspect model your.onnx
$ polygraphy inspect model your.engine

anil.kumarp0255 · January 25, 2024, 7:01am

Here is the ONNX model info

[I] Loading model: /home/nvidia/yolov8-trt-test/Default-medication_box_Mask_Default/Instance_Segmentation5_meta_Tool1/Models/Model_2023_09_25/model_final.onnx
[I] ==== ONNX Model ====
Name: torch-jit-export | ONNX Opset: 12
---- 1 Graph Input(s) ----
{images [dtype=float32, shape=(1, 3, 480, 640)]}

---- 5 Graph Output(s) ----
{valid [dtype=int64, shape=('Castvalid_dim_0',)],
 rois [dtype=float32, shape=('Gatherrois_dim_0', 4)],
 scores [dtype=float32, shape=('Gatherscores_dim_0',)],
 class_ids [dtype=int64, shape=('Gatherclass_ids_dim_0',)],
 masks [dtype=float32, shape=('Reshapemasks_dim_0', 120, 160)]}

---- 219 Initializer(s) ----

---- 376 Node(s) ----

Here is the tensorrt model info

[I] Loading bytes from /home/nvidia/yolov8-trt-test/Default-medication_box_Mask_Default/Instance_Segmentation5_meta_Tool1/Models/Model_2023_09_25/model_final.onnx_b1_gpu0_fp32.engine
[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine

---- 1 Engine Input(s) ----
{images [dtype=float32, shape=(1, 3, 480, 640)]}

---- 5 Engine Output(s) ----
{valid [dtype=int32, shape=(-1,)],
 rois [dtype=float32, shape=(-1, 4)],
 scores [dtype=float32, shape=(-1,)],
 class_ids [dtype=int32, shape=(-1,)],
 masks [dtype=float32, shape=(-1, 120, 160)]}

---- Memory ----
Device Memory: 49400320 bytes

---- 1 Profile(s) (6 Tensor(s) Each) ----
- Profile: 0
    Tensor: images             (Input), Index: 0 | Shapes: min=(1, 3, 480, 640), opt=(1, 3, 480, 640), max=(1, 3, 480, 640)
    Tensor: valid             (Output), Index: 1 | Shape: (-1,)
    Tensor: rois              (Output), Index: 2 | Shape: (-1, 4)
    Tensor: scores            (Output), Index: 3 | Shape: (-1,)
    Tensor: class_ids         (Output), Index: 4 | Shape: (-1,)
    Tensor: masks             (Output), Index: 5 | Shape: (-1, 120, 160)

---- 415 Layer(s) ----

Morganh · January 25, 2024, 7:03am

Thanks for the info. Will let deepstream team know.

Morganh · January 25, 2024, 7:09am

anil.kumarp0255:

{valid [dtype=int64, shape=('Castvalid_dim_0',)],
 rois [dtype=float32, shape=('Gatherrois_dim_0', 4)],
 scores [dtype=float32, shape=('Gatherscores_dim_0',)],
 class_ids [dtype=int64, shape=('Gatherclass_ids_dim_0',)],
 masks [dtype=float32, shape=('Reshapemasks_dim_0', 120, 160)]}

For workaround, could you modify your onnx file to

 {valid [dtype=int64, shape=('Castvalid_dim_0', 1)],
 rois [dtype=float32, shape=('Gatherrois_dim_0', 4)],
 scores [dtype=float32, shape=('Gatherscores_dim_0', 1)],
 class_ids [dtype=int64, shape=('Gatherclass_ids_dim_0', 1)],
 masks [dtype=float32, shape=('Reshapemasks_dim_0', 120, 160)]}

anil.kumarp0255 · January 25, 2024, 7:11am

Hi, Morganh i modified the onnx file like you mentioned its working fine with deepstream but some times its getting below error.

GObject.threads_init()
Creating Pipeline

Creating streamux

Creating source_bin 0

Creating source bin
source-bin-00
Creating Pgie

Creating tiler

Creating nvvidconv

Creating nvosd

Creating transform

Creating EGLSink

Atleast one of the sources is live
the batch size is 1
Adding elements to Pipeline

Linking elements in the Pipeline

object_detector_medicine.py:524: PyGIDeprecationWarning: GObject.MainLoop is deprecated; use GLib.MainLoop instead
loop = GObject.MainLoop()
Now playing…
1 : rtsp://solomon:solomon888@10.1.2.124:88/videoMain
Starting pipeline

Using winsys: x11
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-6.2/lib/libnvds_nvmultiobjecttracker.so
gstnvtracker: Batch processing is ON
gstnvtracker: Past frame output is OFF
[NvTrackerParams::getConfigRoot()] !!![WARNING] Empty config file path is provided. Will go ahead with default values
[NvMultiObjectTracker] Initialized
WARNING: Deserialize engine failed because file path: /opt/nvidia/deepstream/deepstream-6.2/sources/deepstream_python_apps/apps/DeepStream-Yolo-Seg/tmp.onnx_b1_gpu0_fp32.engine open error
0:00:01.699220396 500062 0x221fa410 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1897> [UID = 1]: deserialize engine from file :/opt/nvidia/deepstream/deepstream-6.2/sources/deepstream_python_apps/apps/DeepStream-Yolo-Seg/tmp.onnx_b1_gpu0_fp32.engine failed
0:00:01.853530352 500062 0x221fa410 WARN nvinfer gstnvinfer.cpp:677:gst_nvinfer_logger: NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2002> [UID = 1]: deserialize backend context from engine from file :/opt/nvidia/deepstream/deepstream-6.2/sources/deepstream_python_apps/apps/DeepStream-Yolo-Seg/tmp.onnx_b1_gpu0_fp32.engine failed, try rebuild
0:00:01.853590642 500062 0x221fa410 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: [TRT]: Tensor DataType is determined at build time for tensors not marked as input or output.
0:03:40.546969364 500062 0x221fa410 INFO nvinfer gstnvinfer.cpp:680:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1955> [UID = 1]: serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-6.2/sources/deepstream_python_apps/apps/DeepStream-Yolo-Seg/tmp.onnx_b1_gpu0_fp32.engine successfully
WARNING: [TRT]: The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
INFO: [Implicit Engine Info]: layers num: 6
0 INPUT kFLOAT images 3x480x640
1 OUTPUT kFLOAT rois 4
2 OUTPUT kFLOAT scores 1
3 OUTPUT kINT32 valid 1
4 OUTPUT kINT32 class_ids 1
5 OUTPUT kFLOAT masks 120x160

0:03:40.849459159 500062 0x221fa410 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus: [UID 1]: Load new model:config_infer_primary_yoloV8_seg_medicine.txt sucessfully
Decodebin child added: source

Decodebin child added: decodebin0

Decodebin child added: rtph264depay0

Decodebin child added: decodebin1

Decodebin child added: rtppcmudepay0

Decodebin child added: mulawdec0

In cb_newpad

gstname= audio/x-raw
Decodebin child added: h264parse0

Decodebin child added: capsfilter0

Decodebin child added: nvv4l2decoder0

Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
In cb_newpad

gstname= video/x-raw
features= <Gst.CapsFeatures object at 0xffff9c8cc580 (GstCapsFeatures at 0xfffe7c07c940)>
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.574771540 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
Error: gst-stream-error-quark: Failed to queue input batch for inferencing (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstNvInfer:primary-inference
Frame Number= 0 Number of Objects= 0
Exiting app

[NvMultiObjectTracker] De-initialized
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.635136950 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.691096746 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.747603152 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.802649350 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing

Morganh · January 26, 2024, 10:45am

anil.kumarp0255:

features= <Gst.CapsFeatures object at 0xffff9c8cc580 (GstCapsFeatures at 0xfffe7c07c940)>
ERROR: [TRT]: 1: [runner.cpp::retrieveOutputTensorResult::609] Error Code 1: Cuda Runtime (invalid argument)
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:03:41.574771540 500062 0x22639000 WARN nvinfer gstnvinfer.cpp:1388:gst_nvinfer_input_queue_loop: error: Failed to queue input batch for inferencing
Error: gst-stream-error-quark: Failed to queue input batch for inferencing (1): /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(1388): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline0/GstNvInfer:primary-inference
Frame Number= 0 Number of Objects= 0
Exiting app

Please search in deepstream forum to check if there is hint. Also @yuweiw to help check further.

yuweiw · January 29, 2024, 1:48am

@anil.kumarp0255 The log is printed from the following source code enqueueBuffer:sources\libs\nvdsinfer\nvdsinfer_backend.cpp. There may be something wrong with your model that causes the input mismatches. Could you attach all your project, including the model, config file, code that we can check?
Also please provide complete information as applicable to your setup. Thanks
Hardware Platform (Jetson / GPU)
DeepStream Version
JetPack Version (valid for Jetson only)
TensorRT Version
NVIDIA GPU Driver Version (valid for GPU only)
Issue Type( questions, new requirements, bugs)
How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Topic		Replies	Views
Failed to used TensorRT Engine file in deepstream DeepStream SDK	16	2765	October 12, 2021
Parsing custom tensorflow model DeepStream SDK	31	573	September 4, 2023
Using trtexec to generate an engine file from an ONNX works error with two RTSP input source Jetson Orin Nano tensorrt	11	1049	January 15, 2024
Some PyTorch model with slicing operation fails on inference TensorRT tensorrt , pytorch , onnx , deepstream	2	1457	January 7, 2022
Reshaping error when set batch-size greater than 1 in onnx modle DeepStream SDK	23	1253	February 10, 2023
How to use onnx file with deepstream-test1-usbcam + Custom models DeepStream SDK	30	4652	October 12, 2021
Failed to parse ONNX model from file DeepStream SDK jetson , deepstream	5	87	January 29, 2025
Migrated from DeepStream 4 to Deepstream 5 and got errors DeepStream SDK nvbugs	36	2390	October 12, 2021
Batch-size with 9 rtsp streams DeepStream SDK hw , cuda , gstreamer	16	1691	October 12, 2021
Deepstream-gaze-app segmentation fault DeepStream SDK	10	259	February 13, 2024

How to generate a tensorrt model that is supported by Deesptream sdk

Related topics