DeepStream Inference Fails for ONNX Model with Batch Size different than 1

szymon.budziak.td · February 13, 2025, 1:22pm

• Hardware Platform (Jetson / GPU) : NVIDIA Jetson AGX Orin
• DeepStream Version : 7.1
• JetPack Version (valid for Jetson only) : 6.1
• TensorRT Version : 8.6.2.3
• Issue Type( questions, new requirements, bugs) : question
Hello,

I am trying to run an ONNX model with an explicitly set batch size of 8 through a simple DeepStream pipeline that only performs inference. The model is available here, with config, labels and simple pipeline that I run the model on:
model.zip (1.0 MB)

When inspecting the model in Netron,

the batch size is correctly set to 8. To match this, I configured nvstreammux with: batch-size=8.
According to the DeepStream FAQ, nvstreammux’s batch size should either match the number of input sources or the model batch size. So I think I set it correctly.

However, when running inference, I encounter the following error:

ERROR: [TRT]: IExecutionContext::enqueueV3: Error Code 7: Internal Error (IShuffleLayer model/output/BiasAdd__82: reshaping failed for tensor: model/output/Sigmoid:0 reshape would change volume 50176 to 401408 Instruction: RESHAPEinput dims{1 1 224 224} reshape dims{8 224 224 1}.)
ERROR: Failed to enqueue trt inference batch
nvinfer gstnvinfer.cpp:1504:gst_nvinfer_input_queue_loop:<cp-nvinfer> error: Failed to queue input batch for inferencing

Interestingly, when I use the same model with batch size explicitly set to 1, it works without issues.

Question:

How can I perform inference on 8 frames simultaneously?

• Do I need to introduce a specific buffer element before nvinfer, or does nvinfer handle batching internally?

• How can I verify that inference is actually happening on 8 frames when the converted model reports:

INPUT  kFLOAT input  3x224x224  
min: 1x3x224x224  
opt: 8x3x224x224  
max: 8x3x224x224

I would like to always perform inference on batch-size=8 instead of 1
Any insights or suggestions would be greatly appreciated!

Fiona.Chen · February 14, 2025, 2:21am

Your model is actually implicit batch dimension and full dimension both. Please set “force-implicit-batch-dim=1” in your model_config.txt file

szymon.budziak.td · February 14, 2025, 7:51am

@Fiona.Chen Thank you for your help; the solution worked!

However, after analyzing inference performance in Nsight Systems, I noticed that frames are not being stacked into batches of 8 as expected. Instead, they are processed sequentially, one by one, despite setting batch-size=8 and force-implicit-batch-dim=1 in config file. Below is a screenshot for reference:

Question:

Is this the expected behavior? I would prefer frames to be processed in batches of 8 since my model is optimized for batched inference and performs more efficiently when processing multiple frames together rather than individually.

Fiona.Chen · February 14, 2025, 8:29am

The nvstreammux batch-size and nvinfer batch-size have different meanings.Frequently Asked Questions — DeepStream documentation
From your code, there is only one camera input with 60 fps, it is a live source, and you set “batched-push-timeout= 80000” with nvstreammux, that means nvstreammux only wait for at most 80ms to get frames for batch from the live source, how can the single camera provide 8 frames in 80ms with 60 FPS?

Fiona.Chen · February 14, 2025, 8:33am

How do you know this from the nsys log?

szymon.budziak.td · February 14, 2025, 9:25am

@Fiona.Chen Thank you for your replay.

If I set nvstreammux’s batch-size to 1, can nvinfer still process 8 frames in a batch? If so, are the frames buffered internally and I can have batch-size in nvstreamux set to 1 and nvinfer batch-size set to 8?
You’re right—I initially set batched-push-timeout too low. I’ve now adjusted it to 280000. Since each frame will be take 16.6 ms and there will be 8 of them.
In Nsight Systems, I assumed that if the GstNvinfer row contains only one buffer_process_batch_num (blue marker), it indicates that frames are processed one at a time. If that’s incorrect, how can I verify how many frames my model processes per batch? Would iterating over frame data in a probe function be the best approach, or is there another way? Also the inference takes the same amount of time as in model where batch size was explicitly set to 1. I am attaching the result of nsys log.
nsys_log.nsys-rep.zip (754.6 KB)

Fiona.Chen · February 14, 2025, 9:32am

The engine is built as batch size 8. The engine always works in batch size 8. But there may not be 8 frames data in the batch.

The num_frames_in_batch in [NvDsBatchMeta] (NVIDIA DeepStream SDK API Reference: _NvDsBatchMeta Struct Reference | NVIDIA Docs) shows you how many frames in the batch.

No other way.

Fiona.Chen · February 14, 2025, 9:36am

If there is only one live source, the batch size 8 may not help the inferencing efficiency.
For it always needs 8/60 second to form the batch, if there is any unstable issue happens to the camera, the time may be longer.

szymon.budziak.td · February 14, 2025, 9:41am

@Fiona.Chen Thank you for your reply.
So i added this to my probe function:

    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer for arcing inference pad buffer probe")
        return Gst.PadProbeReturn.OK
    
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    print(batch_meta.num_frames_in_batch)
    print(batch_meta.max_frames_in_batch)

What i get as a result is

1
8

which mean that max_frames_in_batch is 8 indeed however num_frames_in_batch is 1. Buffer of 8 is not being created even tho i have specified implicitly that buffer should be 8.

szymon.budziak.td · February 18, 2025, 7:45am

Hello @Fiona.Chen
I wanted to check if there are any updates regarding my question. Would appreciate any insights you can share. Thanks again!

Fiona.Chen · February 21, 2025, 8:07am

Please refer to the attached customized usb camera pipeline.

The features of my camera:

v4l2-ctl --device=/dev/video0 --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
        Type: Video Capture

        [0]: 'MJPG' (Motion-JPEG, compressed)
                Size: Discrete 640x480
                        Interval: Discrete 0.040s (25.000 fps)
                Size: Discrete 1280x720
                        Interval: Discrete 0.040s (25.000 fps)
                Size: Discrete 1920x1080
                        Interval: Discrete 0.040s (25.000 fps)
        [1]: 'YUYV' (YUYV 4:2:2)
                Size: Discrete 640x480
                        Interval: Discrete 0.040s (25.000 fps)
                Size: Discrete 1280x720
                        Interval: Discrete 0.100s (10.000 fps)
                Size: Discrete 1920x1080
                        Interval: Discrete 0.200s (5.000 fps)

I configured the “YUYV, 25 fps, 640x480” caps after the v4l2src and link v4l2src to videoconvert to make the camera output 25fps 640x480 YUYV raw data.

Please pay attention to the properties settings for “nvvideoconvert”, “nvstreammux” and dstest1_pgie_config.txt configuration.

deepstream_test_1_usb.py (11.3 KB)
dstest1_pgie_config.txt (2.9 KB)

Fiona.Chen · February 21, 2025, 8:09am

Parts of my log:

8
8
Frame Number=616 Number of Objects=1 Vehicle_count=1 Person_count=0
Frame Number=617 Number of Objects=1 Vehicle_count=2 Person_count=0
Frame Number=618 Number of Objects=0 Vehicle_count=2 Person_count=0
Frame Number=619 Number of Objects=0 Vehicle_count=2 Person_count=0
Frame Number=620 Number of Objects=0 Vehicle_count=2 Person_count=0
Frame Number=621 Number of Objects=0 Vehicle_count=2 Person_count=0
Frame Number=622 Number of Objects=0 Vehicle_count=2 Person_count=0
Frame Number=623 Number of Objects=0 Vehicle_count=2 Person_count=0
8
8
Frame Number=624 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=625 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=626 Number of Objects=1 Vehicle_count=1 Person_count=0
Frame Number=627 Number of Objects=1 Vehicle_count=2 Person_count=0
Frame Number=628 Number of Objects=0 Vehicle_count=2 Person_count=0
Frame Number=629 Number of Objects=0 Vehicle_count=2 Person_count=0
Frame Number=630 Number of Objects=0 Vehicle_count=2 Person_count=0
Frame Number=631 Number of Objects=1 Vehicle_count=3 Person_count=0
8
8
Frame Number=632 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=633 Number of Objects=1 Vehicle_count=1 Person_count=0
Frame Number=634 Number of Objects=1 Vehicle_count=2 Person_count=0
Frame Number=635 Number of Objects=1 Vehicle_count=3 Person_count=0
Frame Number=636 Number of Objects=0 Vehicle_count=3 Person_count=0
Frame Number=637 Number of Objects=0 Vehicle_count=3 Person_count=0
Frame Number=638 Number of Objects=0 Vehicle_count=3 Person_count=0
Frame Number=639 Number of Objects=0 Vehicle_count=3 Person_count=0
8
8
Frame Number=640 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=641 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=642 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=643 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=644 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=645 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=646 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=647 Number of Objects=0 Vehicle_count=0 Person_count=0

szymon.budziak.td · February 21, 2025, 9:45am

@Fiona.Chen Thank you for your response and for providing an example! I ran your example with minor adjustments for my camera setup and indeed it runs in 8 num_frames_in_batch.

However, I noticed that you did not set streammux.set_property('live-source', True). Could you clarify the reason for this? When I enabled this property, I immediately observed num_frames_in_batch = 1 instead of 8. Does this property influence the number of frames buffered and passed through nvstreammux?

Additionally, if I want to run my pipeline with a camera and perform inference with a batch size of 8, do I need to disable live-source or i still can process 8 frames in buffer with property live-source=True?

Lastly, I noticed that nvvidconvsrc has output-buffers set to 9, while the batch size for both nvstreammux and nvinfer is 8. Could you explain the reasoning behind this? When I commented this property out the pipeline worked in the same way having 8 frames in a batch.

Also, when I commented out nvmultistreamtiler and added only fakesink after nvinfer element, pipeline seems to be stuck. Is nvmultistreamtiler necessary after nvinfer to run next inferences in batch-size of 8?

Thanks in advance!

Fiona.Chen · February 21, 2025, 10:20am

When nvstreammux “live-source” is set TRUE, the nvstreammux will not wait for the batch to be filled because the latency is very important to most of the customers who work with live sources.

You need to disable “live-source”.

The “output-buffers” of nvvideoconvert is the output buffer pool size, if you want the downstream elements always able to get 8 frames from nvvideoconvert, you need to set the pool size larger than 8.

It will not stuck in my board. But you must use either “nvmultistreamtiler” or “nvstreamdemux” to make the batch data to be converted to non-batch data correctly before you send the video data to any element which can’t handle batch. fakesink can’t handle batch data.

szymon.budziak.td · February 21, 2025, 10:32am

@Fiona.Chen Thank you for your response! I have a few additional questions:

In the case of nvvidconvsrc, when I commented out nvvidconvsrc.set_property('output-buffers', 9), the pipeline still processed 8 buffers at a time correctly. Why is that?
My model performs a single inference, including postprocessing in 3-4 ms. Would it be better/advised to set live-source=True and perform inference frame-by-frame since it is faster than generation of single frame (16.6 ms), or should I set live-source=False to enable batch inference?
Is there a sink element that can handle batched input directly, or must batch inference results always be converted back to non-batch format before passing them to a sink?

Fiona.Chen · February 21, 2025, 10:38am

Seems you also changed something else.

It depends on you.

No.

Yes.

system · March 7, 2025, 10:38am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Batch-size with 9 rtsp streams DeepStream SDK hw , cuda , gstreamer	16	1691	October 12, 2021
DeepStream Pipeline Synchronization Issues & Conditional Inference Activation DeepStream SDK jetson-inference , gstreamer , python , deepstream	17	39	March 19, 2025
Nvstreamux process 1 frame in a batch at a time instead of 32 frames DeepStream SDK deepstream	13	63	November 15, 2024
Nvinfer batch-size from video file input DeepStream SDK jetson-inference	14	803	March 19, 2024
Issue with Converting ONNX Model with different dimensions to TensorRT Engine for DeepStream DeepStream SDK deepstream	21	133	May 23, 2025
Deepstreamer Pipeline: Optimisation GPU Utilisation DeepStream SDK gstreamer , fps , deepstream	22	99	December 12, 2024
Reshaping error when set batch-size greater than 1 in onnx modle DeepStream SDK	23	1253	February 10, 2023
Migrated from DeepStream 4 to Deepstream 5 and got errors DeepStream SDK nvbugs	36	2390	October 12, 2021
GPU keep load 99% when running the deepstream_parallel_inference_app DeepStream SDK	45	264	June 20, 2024
Create batch of frames for a single file stream DeepStream SDK tensorrt , gstreamer	6	1173	October 12, 2021

DeepStream Inference Fails for ONNX Model with Batch Size different than 1

Related topics