Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) -Jetson Orin
• DeepStream Version -6.4
• JetPack Version (valid for Jetson only) -6.0+b106
• TensorRT Version -8.6.4
• NVIDIA GPU Driver Version (valid for GPU only) -12.2
• Issue Type( questions, new requirements, bugs) - we have face detection model like ultra light face detection and we have converted onnx file into engine file with help of custom parser and it runs only one batch and it deos not support multiple batch so where should i concern for this support multiple batch.can you give solution for this ?
When converting from onnx to engine file, was the batchsize
parameter specified?
Can the model be converted using trtexec
? trtexec can specify batchsize parameter
when we set batch size as two then it would not generate engine and error is occuring when using trtexec
jetson@ubuntu:/nvme0n1/face_attendance_search_final$ /usr/src/tensorrt/bin/trtexec --onnx=face_attendance_search_files/onnx_file/ultra_light_640.onnx --fp16 --optShapes=input:2x3x480x640 --shapes=input:2x3x480x640 --saveEngine=face_attendance_search_files/onnx_file/ultra_light_640.onnx_b2_gpu0_fp16.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=face_attendance_search_files/onnx_file/ultra_light_640.onnx --fp16 --optShapes=input:2x3x480x640 --shapes=input:2x3x480x640 --saveEngine=face_attendance_search_files/onnx_file/ultra_light_640.onnx_b2_gpu0_fp16.engine
[09/14/2024-09:47:55] [W] optShapes is being broadcasted to minShapes for tensor input
[09/14/2024-09:47:55] [W] optShapes is being broadcasted to maxShapes for tensor input
[09/14/2024-09:47:55] [I] === Model Options ===
[09/14/2024-09:47:55] [I] Format: ONNX
[09/14/2024-09:47:55] [I] Model: face_attendance_search_files/onnx_file/ultra_light_640.onnx
[09/14/2024-09:47:55] [I] Output:
[09/14/2024-09:47:55] [I] === Build Options ===
[09/14/2024-09:47:55] [I] Max batch: explicit batch
[09/14/2024-09:47:55] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[09/14/2024-09:47:55] [I] minTiming: 1
[09/14/2024-09:47:55] [I] avgTiming: 8
[09/14/2024-09:47:55] [I] Precision: FP32+FP16
[09/14/2024-09:47:55] [I] LayerPrecisions:
[09/14/2024-09:47:55] [I] Layer Device Types:
[09/14/2024-09:47:55] [I] Calibration:
[09/14/2024-09:47:55] [I] Refit: Disabled
[09/14/2024-09:47:55] [I] Version Compatible: Disabled
[09/14/2024-09:47:55] [I] ONNX Native InstanceNorm: Disabled
[09/14/2024-09:47:55] [I] TensorRT runtime: full
[09/14/2024-09:47:55] [I] Lean DLL Path:
[09/14/2024-09:47:55] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[09/14/2024-09:47:55] [I] Exclude Lean Runtime: Disabled
[09/14/2024-09:47:55] [I] Sparsity: Disabled
[09/14/2024-09:47:55] [I] Safe mode: Disabled
[09/14/2024-09:47:55] [I] Build DLA standalone loadable: Disabled
[09/14/2024-09:47:55] [I] Allow GPU fallback for DLA: Disabled
[09/14/2024-09:47:55] [I] DirectIO mode: Disabled
[09/14/2024-09:47:55] [I] Restricted mode: Disabled
[09/14/2024-09:47:55] [I] Skip inference: Disabled
[09/14/2024-09:47:55] [I] Save engine: face_attendance_search_files/onnx_file/ultra_light_640.onnx_b2_gpu0_fp16.engine
[09/14/2024-09:47:55] [I] Load engine:
[09/14/2024-09:47:55] [I] Profiling verbosity: 0
[09/14/2024-09:47:55] [I] Tactic sources: Using default tactic sources
[09/14/2024-09:47:55] [I] timingCacheMode: local
[09/14/2024-09:47:55] [I] timingCacheFile:
[09/14/2024-09:47:55] [I] Heuristic: Disabled
[09/14/2024-09:47:55] [I] Preview Features: Use default preview flags.
[09/14/2024-09:47:55] [I] MaxAuxStreams: -1
[09/14/2024-09:47:55] [I] BuilderOptimizationLevel: -1
[09/14/2024-09:47:55] [I] Input(s)s format: fp32:CHW
[09/14/2024-09:47:55] [I] Output(s)s format: fp32:CHW
[09/14/2024-09:47:55] [I] Input build shape: input=2x3x480x640+2x3x480x640+2x3x480x640
[09/14/2024-09:47:55] [I] Input calibration shapes: model
[09/14/2024-09:47:55] [I] === System Options ===
[09/14/2024-09:47:55] [I] Device: 0
[09/14/2024-09:47:55] [I] DLACore:
[09/14/2024-09:47:55] [I] Plugins:
[09/14/2024-09:47:55] [I] setPluginsToSerialize:
[09/14/2024-09:47:55] [I] dynamicPlugins:
[09/14/2024-09:47:55] [I] ignoreParsedPluginLibs: 0
[09/14/2024-09:47:55] [I]
[09/14/2024-09:47:55] [I] === Inference Options ===
[09/14/2024-09:47:55] [I] Batch: Explicit
[09/14/2024-09:47:55] [I] Input inference shape: input=2x3x480x640
[09/14/2024-09:47:55] [I] Iterations: 10
[09/14/2024-09:47:55] [I] Duration: 3s (+ 200ms warm up)
[09/14/2024-09:47:55] [I] Sleep time: 0ms
[09/14/2024-09:47:55] [I] Idle time: 0ms
[09/14/2024-09:47:55] [I] Inference Streams: 1
[09/14/2024-09:47:55] [I] ExposeDMA: Disabled
[09/14/2024-09:47:55] [I] Data transfers: Enabled
[09/14/2024-09:47:55] [I] Spin-wait: Disabled
[09/14/2024-09:47:55] [I] Multithreading: Disabled
[09/14/2024-09:47:55] [I] CUDA Graph: Disabled
[09/14/2024-09:47:55] [I] Separate profiling: Disabled
[09/14/2024-09:47:55] [I] Time Deserialize: Disabled
[09/14/2024-09:47:55] [I] Time Refit: Disabled
[09/14/2024-09:47:55] [I] NVTX verbosity: 0
[09/14/2024-09:47:55] [I] Persistent Cache Ratio: 0
[09/14/2024-09:47:55] [I] Inputs:
[09/14/2024-09:47:55] [I] === Reporting Options ===
[09/14/2024-09:47:55] [I] Verbose: Disabled
[09/14/2024-09:47:55] [I] Averages: 10 inferences
[09/14/2024-09:47:55] [I] Percentiles: 90,95,99
[09/14/2024-09:47:55] [I] Dump refittable layers:Disabled
[09/14/2024-09:47:55] [I] Dump output: Disabled
[09/14/2024-09:47:55] [I] Profile: Disabled
[09/14/2024-09:47:55] [I] Export timing to JSON file:
[09/14/2024-09:47:55] [I] Export output to JSON file:
[09/14/2024-09:47:55] [I] Export profile to JSON file:
[09/14/2024-09:47:55] [I]
[09/14/2024-09:47:55] [I] === Device Information ===
[09/14/2024-09:47:55] [I] Selected Device: Orin
[09/14/2024-09:47:55] [I] Compute Capability: 8.7
[09/14/2024-09:47:55] [I] SMs: 16
[09/14/2024-09:47:55] [I] Device Global Memory: 30697 MiB
[09/14/2024-09:47:55] [I] Shared Memory per SM: 164 KiB
[09/14/2024-09:47:55] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/14/2024-09:47:55] [I] Application Compute Clock Rate: 1.3 GHz
[09/14/2024-09:47:55] [I] Application Memory Clock Rate: 1.3 GHz
[09/14/2024-09:47:55] [I]
[09/14/2024-09:47:55] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[09/14/2024-09:47:55] [I]
[09/14/2024-09:47:55] [I] TensorRT version: 8.6.2
[09/14/2024-09:47:55] [I] Loading standard plugins
[09/14/2024-09:47:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 33, GPU 11954 (MiB)
[09/14/2024-09:48:00] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +1101, now: CPU 1223, GPU 13090 (MiB)
[09/14/2024-09:48:00] [I] Start parsing network model.
[09/14/2024-09:48:00] [I] [TRT] ----------------------------------------------------------------
[09/14/2024-09:48:00] [I] [TRT] Input filename: face_attendance_search_files/onnx_file/ultra_light_640.onnx
[09/14/2024-09:48:00] [I] [TRT] ONNX IR version: 0.0.4
[09/14/2024-09:48:00] [I] [TRT] Opset version: 9
[09/14/2024-09:48:00] [I] [TRT] Producer name: pytorch
[09/14/2024-09:48:00] [I] [TRT] Producer version: 1.2
[09/14/2024-09:48:00] [I] [TRT] Domain:
[09/14/2024-09:48:00] [I] [TRT] Model version: 0
[09/14/2024-09:48:00] [I] [TRT] Doc string:
[09/14/2024-09:48:00] [I] [TRT] ----------------------------------------------------------------
[09/14/2024-09:48:00] [W] [TRT] onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/14/2024-09:48:00] [I] Finished parsing network model. Parse time: 0.0179808
[09/14/2024-09:48:00] [E] Static model does not take explicit shapes since the shape of inference tensors will be determined by the model itself
[09/14/2024-09:48:00] [E] Network And Config setup failed
[09/14/2024-09:48:00] [E] Building engine failed
[09/14/2024-09:48:00] [E] Failed to create engine from model or file.
[09/14/2024-09:48:00] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8602] # /usr/src/tensorrt/bin/trtexec --onnx=face_attendance_search_files/onnx_file/ultra_light_640.onnx --fp16 --optShapes=input:2x3x480x640 --shapes=input:2x3x480x640 --saveEngine=face_attendance_search_files/onnx_file/ultra_light_640.onnx_b2_gpu0_fp16.engine
1.Apply this patch for Ultra-Light-Fast-Generic-Face-Detector-1MB
, Allow exported Onnx to accept dynamic batch size.
diff --git a/convert_to_onnx.py b/convert_to_onnx.py
index cd4bdf2..328edec 100644
--- a/convert_to_onnx.py
+++ b/convert_to_onnx.py
@@ -40,4 +40,6 @@ model_path = f"models/onnx/{model_name}.onnx"
dummy_input = torch.randn(1, 3, 240, 320).to("cuda")
# dummy_input = torch.randn(1, 3, 480, 640).to("cuda") #if input size is 640*480
-torch.onnx.export(net, dummy_input, model_path, verbose=False, input_names=['input'], output_names=['scores', 'boxes'])
+dynamic_axes = {'input': {0: 'batch_size'},
+ 'output': {0: 'batch_size'}}
+torch.onnx.export(net, dummy_input, model_path, verbose=False, input_names=['input'], output_names=['scores', 'boxes'], dynamic_axes=dynamic_axes)
root@ipp1-2189:~/Ultra-Light-Fast-Generic-Face-Detector-1MB#
2.export onnx
python3 convert_to_onnx.py
3.export engine file
/usr/src/tensorrt/bin/trtexec --onnx=models/onnx/version-RFB-320.onnx --fp16 --minShapes=input:1x3x240x320 --optShapes=input:8x3x240x320 --maxShapes=input:8x3x240x320 --dumpLayerInfo --exportLayerInfo=d.layer.json --saveEngine=tf.fp16.engine > log.log 2>&1
You can get more useful information here
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.