Regarding doubts about deepstream custom parser for onnx with deepstream batch

pandian · September 12, 2024, 5:02am

Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) -Jetson Orin
• DeepStream Version -6.4
• JetPack Version (valid for Jetson only) -6.0+b106
• TensorRT Version -8.6.4
• NVIDIA GPU Driver Version (valid for GPU only) -12.2
• Issue Type( questions, new requirements, bugs) - we have face detection model like ultra light face detection and we have converted onnx file into engine file with help of custom parser and it runs only one batch and it deos not support multiple batch so where should i concern for this support multiple batch.can you give solution for this ?

junshengy · September 12, 2024, 9:04pm

When converting from onnx to engine file, was the batchsize parameter specified?

Can the model be converted using trtexec? trtexec can specify batchsize parameter

pandian · September 14, 2024, 4:18am

when we set batch size as two then it would not generate engine and error is occuring when using trtexec

jetson@ubuntu:/nvme0n1/face_attendance_search_final$ /usr/src/tensorrt/bin/trtexec --onnx=face_attendance_search_files/onnx_file/ultra_light_640.onnx --fp16 --optShapes=input:2x3x480x640 --shapes=input:2x3x480x640 --saveEngine=face_attendance_search_files/onnx_file/ultra_light_640.onnx_b2_gpu0_fp16.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=face_attendance_search_files/onnx_file/ultra_light_640.onnx --fp16 --optShapes=input:2x3x480x640 --shapes=input:2x3x480x640 --saveEngine=face_attendance_search_files/onnx_file/ultra_light_640.onnx_b2_gpu0_fp16.engine
[09/14/2024-09:47:55] [W] optShapes is being broadcasted to minShapes for tensor input
[09/14/2024-09:47:55] [W] optShapes is being broadcasted to maxShapes for tensor input
[09/14/2024-09:47:55] [I] === Model Options ===
[09/14/2024-09:47:55] [I] Format: ONNX
[09/14/2024-09:47:55] [I] Model: face_attendance_search_files/onnx_file/ultra_light_640.onnx
[09/14/2024-09:47:55] [I] Output:
[09/14/2024-09:47:55] [I] === Build Options ===
[09/14/2024-09:47:55] [I] Max batch: explicit batch
[09/14/2024-09:47:55] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/14/2024-09:47:55] [I] minTiming: 1
[09/14/2024-09:47:55] [I] avgTiming: 8
[09/14/2024-09:47:55] [I] Precision: FP32+FP16
[09/14/2024-09:47:55] [I] LayerPrecisions:
[09/14/2024-09:47:55] [I] Layer Device Types:
[09/14/2024-09:47:55] [I] Calibration:
[09/14/2024-09:47:55] [I] Refit: Disabled
[09/14/2024-09:47:55] [I] Version Compatible: Disabled
[09/14/2024-09:47:55] [I] ONNX Native InstanceNorm: Disabled
[09/14/2024-09:47:55] [I] TensorRT runtime: full
[09/14/2024-09:47:55] [I] Lean DLL Path:
[09/14/2024-09:47:55] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/14/2024-09:47:55] [I] Exclude Lean Runtime: Disabled
[09/14/2024-09:47:55] [I] Sparsity: Disabled
[09/14/2024-09:47:55] [I] Safe mode: Disabled
[09/14/2024-09:47:55] [I] Build DLA standalone loadable: Disabled
[09/14/2024-09:47:55] [I] Allow GPU fallback for DLA: Disabled
[09/14/2024-09:47:55] [I] DirectIO mode: Disabled
[09/14/2024-09:47:55] [I] Restricted mode: Disabled
[09/14/2024-09:47:55] [I] Skip inference: Disabled
[09/14/2024-09:47:55] [I] Save engine: face_attendance_search_files/onnx_file/ultra_light_640.onnx_b2_gpu0_fp16.engine
[09/14/2024-09:47:55] [I] Load engine:
[09/14/2024-09:47:55] [I] Profiling verbosity: 0
[09/14/2024-09:47:55] [I] Tactic sources: Using default tactic sources
[09/14/2024-09:47:55] [I] timingCacheMode: local
[09/14/2024-09:47:55] [I] timingCacheFile:
[09/14/2024-09:47:55] [I] Heuristic: Disabled
[09/14/2024-09:47:55] [I] Preview Features: Use default preview flags.
[09/14/2024-09:47:55] [I] MaxAuxStreams: -1
[09/14/2024-09:47:55] [I] BuilderOptimizationLevel: -1
[09/14/2024-09:47:55] [I] Input(s)s format: fp32:CHW
[09/14/2024-09:47:55] [I] Output(s)s format: fp32:CHW
[09/14/2024-09:47:55] [I] Input build shape: input=2x3x480x640+2x3x480x640+2x3x480x640
[09/14/2024-09:47:55] [I] Input calibration shapes: model
[09/14/2024-09:47:55] [I] === System Options ===
[09/14/2024-09:47:55] [I] Device: 0
[09/14/2024-09:47:55] [I] DLACore:
[09/14/2024-09:47:55] [I] Plugins:
[09/14/2024-09:47:55] [I] setPluginsToSerialize:
[09/14/2024-09:47:55] [I] dynamicPlugins:
[09/14/2024-09:47:55] [I] ignoreParsedPluginLibs: 0
[09/14/2024-09:47:55] [I]
[09/14/2024-09:47:55] [I] === Inference Options ===
[09/14/2024-09:47:55] [I] Batch: Explicit
[09/14/2024-09:47:55] [I] Input inference shape: input=2x3x480x640
[09/14/2024-09:47:55] [I] Iterations: 10
[09/14/2024-09:47:55] [I] Duration: 3s (+ 200ms warm up)
[09/14/2024-09:47:55] [I] Sleep time: 0ms
[09/14/2024-09:47:55] [I] Idle time: 0ms
[09/14/2024-09:47:55] [I] Inference Streams: 1
[09/14/2024-09:47:55] [I] ExposeDMA: Disabled
[09/14/2024-09:47:55] [I] Data transfers: Enabled
[09/14/2024-09:47:55] [I] Spin-wait: Disabled
[09/14/2024-09:47:55] [I] Multithreading: Disabled
[09/14/2024-09:47:55] [I] CUDA Graph: Disabled
[09/14/2024-09:47:55] [I] Separate profiling: Disabled
[09/14/2024-09:47:55] [I] Time Deserialize: Disabled
[09/14/2024-09:47:55] [I] Time Refit: Disabled
[09/14/2024-09:47:55] [I] NVTX verbosity: 0
[09/14/2024-09:47:55] [I] Persistent Cache Ratio: 0
[09/14/2024-09:47:55] [I] Inputs:
[09/14/2024-09:47:55] [I] === Reporting Options ===
[09/14/2024-09:47:55] [I] Verbose: Disabled
[09/14/2024-09:47:55] [I] Averages: 10 inferences
[09/14/2024-09:47:55] [I] Percentiles: 90,95,99
[09/14/2024-09:47:55] [I] Dump refittable layers:Disabled
[09/14/2024-09:47:55] [I] Dump output: Disabled
[09/14/2024-09:47:55] [I] Profile: Disabled
[09/14/2024-09:47:55] [I] Export timing to JSON file:
[09/14/2024-09:47:55] [I] Export output to JSON file:
[09/14/2024-09:47:55] [I] Export profile to JSON file:
[09/14/2024-09:47:55] [I]
[09/14/2024-09:47:55] [I] === Device Information ===
[09/14/2024-09:47:55] [I] Selected Device: Orin
[09/14/2024-09:47:55] [I] Compute Capability: 8.7
[09/14/2024-09:47:55] [I] SMs: 16
[09/14/2024-09:47:55] [I] Device Global Memory: 30697 MiB
[09/14/2024-09:47:55] [I] Shared Memory per SM: 164 KiB
[09/14/2024-09:47:55] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/14/2024-09:47:55] [I] Application Compute Clock Rate: 1.3 GHz
[09/14/2024-09:47:55] [I] Application Memory Clock Rate: 1.3 GHz
[09/14/2024-09:47:55] [I]
[09/14/2024-09:47:55] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/14/2024-09:47:55] [I]
[09/14/2024-09:47:55] [I] TensorRT version: 8.6.2
[09/14/2024-09:47:55] [I] Loading standard plugins
[09/14/2024-09:47:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 11954 (MiB)
[09/14/2024-09:48:00] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1101, now: CPU 1223, GPU 13090 (MiB)
[09/14/2024-09:48:00] [I] Start parsing network model.
[09/14/2024-09:48:00] [I] [TRT] ----------------------------------------------------------------
[09/14/2024-09:48:00] [I] [TRT] Input filename: face_attendance_search_files/onnx_file/ultra_light_640.onnx
[09/14/2024-09:48:00] [I] [TRT] ONNX IR version: 0.0.4
[09/14/2024-09:48:00] [I] [TRT] Opset version: 9
[09/14/2024-09:48:00] [I] [TRT] Producer name: pytorch
[09/14/2024-09:48:00] [I] [TRT] Producer version: 1.2
[09/14/2024-09:48:00] [I] [TRT] Domain:
[09/14/2024-09:48:00] [I] [TRT] Model version: 0
[09/14/2024-09:48:00] [I] [TRT] Doc string:
[09/14/2024-09:48:00] [I] [TRT] ----------------------------------------------------------------
[09/14/2024-09:48:00] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/14/2024-09:48:00] [I] Finished parsing network model. Parse time: 0.0179808
[09/14/2024-09:48:00] [E] Static model does not take explicit shapes since the shape of inference tensors will be determined by the model itself
[09/14/2024-09:48:00] [E] Network And Config setup failed
[09/14/2024-09:48:00] [E] Building engine failed
[09/14/2024-09:48:00] [E] Failed to create engine from model or file.
[09/14/2024-09:48:00] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=face_attendance_search_files/onnx_file/ultra_light_640.onnx --fp16 --optShapes=input:2x3x480x640 --shapes=input:2x3x480x640 --saveEngine=face_attendance_search_files/onnx_file/ultra_light_640.onnx_b2_gpu0_fp16.engine

junshengy · September 14, 2024, 9:04am

1.Apply this patch for Ultra-Light-Fast-Generic-Face-Detector-1MB, Allow exported Onnx to accept dynamic batch size.

diff --git a/convert_to_onnx.py b/convert_to_onnx.py
index cd4bdf2..328edec 100644
--- a/convert_to_onnx.py
+++ b/convert_to_onnx.py
@@ -40,4 +40,6 @@ model_path = f"models/onnx/{model_name}.onnx"
 
 dummy_input = torch.randn(1, 3, 240, 320).to("cuda")
 # dummy_input = torch.randn(1, 3, 480, 640).to("cuda") #if input size is 640*480
-torch.onnx.export(net, dummy_input, model_path, verbose=False, input_names=['input'], output_names=['scores', 'boxes'])
+dynamic_axes = {'input': {0: 'batch_size'}, 
+                'output': {0: 'batch_size'}}
+torch.onnx.export(net, dummy_input, model_path, verbose=False, input_names=['input'], output_names=['scores', 'boxes'], dynamic_axes=dynamic_axes)
root@ipp1-2189:~/Ultra-Light-Fast-Generic-Face-Detector-1MB#

2.export onnx

python3 convert_to_onnx.py

3.export engine file

/usr/src/tensorrt/bin/trtexec --onnx=models/onnx/version-RFB-320.onnx --fp16 --minShapes=input:1x3x240x320 --optShapes=input:8x3x240x320 --maxShapes=input:8x3x240x320 --dumpLayerInfo --exportLayerInfo=d.layer.json --saveEngine=tf.fp16.engine  > log.log 2>&1

junshengy · September 14, 2024, 9:05am

You can get more useful information here

github.com/NVIDIA/TensorRT

Trtexec : Static model does not take explicit shapes since the shape of inference tensors will be determined by the model itself

opened 11:18AM - 08 Jun 22 UTC

closed 05:25PM - 08 Jun 22 UTC

HarrySm

question

## Description I have ERROR when running ONNX model using trtexec CLI when ad…ding the shapes options as done [here](https://github.com/NVIDIA/TensorRT/tree/main/samples/trtexec#example-4-running-an-onnx-model-with-full-dimensions-and-dynamic-shapes). ERROR: ![image](https://user-images.githubusercontent.com/107109884/172603897-9c6ea3e1-6286-4e7f-a14a-b4e16676812f.png) ## Environment **TensorRT Version**: trtexec command line interface **GPU Type**: JEtson AGX ORIN **Nvidia Driver Version**: **CUDA Version**: 11.4 **CUDNN Version**: 8.3.2.49 **Operating System + Version**: ubuntu 20.04 **Python Version (if applicable)**: **TensorFlow Version (if applicable)**: **PyTorch Version (if applicable)**: **Baremetal or Container (if container which image + tag)**: baremetal ## Relevant Files resnet18 ONNX model [resnet18-v1-7.onnx|attachment](upload://ArS5IRHisbdDvCWzZkwgO91eT7v.onnx) (44.7 MB) ## Steps To Reproduce ``` $ trtexec --onnx=resnet18-v1-7.onnx --explicitBatch --int8 --allowGPUFallback --useDLACore=1 --workspace=2048 --shapes=data:1x3x224x224 ``` > **NOTE** : > - > this error appears when adding the **--shapes** option, I am using this option to change the batch size in this command below I put 1x3x224x224 so batch=1 I know by default it is equal to one but my goal is to change it to like 8 or 16 but I put 1 just to show you an example. ***If you have other solutions that let me change the batch size using ONNX pretrained model feel free to suggest them to me.*** you can see the issue [here ](https://forums.developer.nvidia.com/t/trtexec-static-model-does-not-take-explicit-shapes-since-the-shape-of-inference-tensors-will-be-determined-by-the-model-itself/216911)as well on NVIDIA topics Thank you in advance. Best regard, Harry

system · October 3, 2024, 11:35am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Batch-size with 9 rtsp streams DeepStream SDK hw , cuda , gstreamer	16	1691	October 12, 2021
How to deploy attention on deepstream6.0? DeepStream SDK	3	176	December 5, 2023
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1415	July 12, 2022
Using Custom action recognition Model in Deepstream 3D action recognition and Getting Error TAO Toolkit	70	929	December 12, 2023
Reshaping error when set batch-size greater than 1 in onnx modle DeepStream SDK	23	1253	February 10, 2023
Issues running Onnx classifier model in deepstream DeepStream SDK tensorrt , onnx	5	1677	October 12, 2021
Process Killed when Generating a TensorRT Engine for the ViT models DeepStream SDK tensorrt , jetson-inference , deepstream	11	309	October 31, 2024
Some PyTorch model with slicing operation fails on inference TensorRT tensorrt , pytorch , onnx , deepstream	2	1462	January 7, 2022
Encountered known unsupported method torch.max_pool3d DeepStream SDK	12	1261	October 12, 2021
Using trtexec to generate an engine file from an ONNX works error with two RTSP input source Jetson Orin Nano tensorrt	11	1054	January 15, 2024

Regarding doubts about deepstream custom parser for onnx with deepstream batch

Related topics