• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) : 6.1
• TensorRT Version : 8.6.2.3
• Issue Type( questions, new requirements, bugs) : question
Hello,
I am trying to run an ONNX model with an explicitly set batch size of 8 through a simple DeepStream pipeline that only performs inference. The model is available here, with config, labels and simple pipeline that I run the model on:
model.zip (1.0 MB)
When inspecting the model in Netron,
the batch size is correctly set to 8. To match this, I configured
nvstreammux
with: batch-size=8
.According to the DeepStream FAQ,
nvstreammux’s batch size
should either match the number of input sources or the model batch size. So I think I set it correctly.
However, when running inference, I encounter the following error:
ERROR: [TRT]: IExecutionContext::enqueueV3: Error Code 7: Internal Error (IShuffleLayer model/output/BiasAdd__82: reshaping failed for tensor: model/output/Sigmoid:0 reshape would change volume 50176 to 401408 Instruction: RESHAPEinput dims{1 1 224 224} reshape dims{8 224 224 1}.)
ERROR: Failed to enqueue trt inference batch
nvinfer gstnvinfer.cpp:1504:gst_nvinfer_input_queue_loop:<cp-nvinfer> error: Failed to queue input batch for inferencing
Interestingly, when I use the same model with batch size explicitly set to 1, it works without issues.
Question:
How can I perform inference on 8 frames simultaneously?
• Do I need to introduce a specific buffer element before nvinfer, or does nvinfer handle batching internally?
• How can I verify that inference is actually happening on 8 frames when the converted model reports:
INPUT kFLOAT input 3x224x224
min: 1x3x224x224
opt: 8x3x224x224
max: 8x3x224x224
I would like to always perform inference on batch-size=8
instead of 1
Any insights or suggestions would be greatly appreciated!