How to Split nvinfer Output Tensor (batch_size, n, n, channels, height, width) into batch_size * n * n Sub-Tensors for Subsequent Inference

john_c · August 13, 2025, 7:06am

• Hardware Platform (Jetson / GPU) Jetson AGX Orin
• DeepStream Version 7.1
• JetPack Version (valid for Jetson only) 6.2.1
• TensorRT Version 10.3
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) questions

Hi,

In my DeepStream pipeline, the output tensor of an nvinfer model has the shape (batch_size, n, n, channels, height, width). I need to split this tensor into batch_size * n * n sub-tensors of shape (channels, height, width) to feed into a subsequent model for inference. Is there any DeepStream component or plugin that can handle this tensor reshaping and splitting process directly? If not, what is the recommended approach to achieve this within the DeepStream framework?

Any guidance or examples would be greatly appreciated!

Thanks,

junshengy · August 15, 2025, 3:57am

You can refer to deepstream_pose_classification_app in deepstream_tao_apps.

github.com/NVIDIA-AI-IOT/deepstream_tao_apps

apps/tao_others/deepstream-pose-classification/deepstream_pose_classification_app.cpp

master

/*
 * SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 * SPDX-License-Identifier: LicenseRef-NvidiaProprietary
 *
 * NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
 * property and proprietary rights in and to this material, related
 * documentation and any modifications thereto. Any use, reproduction,
 * disclosure or distribution of this material and related documentation
 * without an express license agreement from NVIDIA CORPORATION or
 * its affiliates is strictly prohibited.
 */

#include <gst/gst.h>
#include <glib.h>
#include <stdio.h>
#include <signal.h>
#include <bits/stdc++.h>

#include "cuda_runtime_api.h"
#include "gstnvdsinfer.h"

This file has been truncated. show original

For this sample:
In the parse_25dpose_from_tensor_meta function, first parse the NVDSINFER_TENSOR_OUTPUT_META as NVDS_OBJ_META ,Then nvdspreprocess will process NVDS_OBJ_META into NVDS_PREPROCESS_BATCH_META as the input of subsequent model.

For your application, you can split NVDSINFER_TENSOR_OUTPUT_META into sub-tensors in the custom processing library of nvdspreprocess

john_c · August 15, 2025, 7:46am

Thank you for your timely reply! I looked at the reference code you posted. In the sgie src probe, it parses NVDSINFER_TENSOR_OUTPUT_META as NVDS_OBJ_META, then in nvdspreprocess, it processes NVDS_OBJ_META into NVDS_PREPROCESS_BATCH_META as the input for sgie1. Like this:

sgie -> sgie_src_pad_buffer_probe -> preprocess1 -> sgie1

But in my case, the sgie output is (batch_size, n, n, channels, height, width), and batch_size x n x n might be relatively large, for example, 1 x 9 x 9 = 81, which exceeds the maximum batch size of sgie1 (possibly 8). In this case, how should I handle it?

junshengy · August 18, 2025, 1:56am

You can directly process NVDSINFER_TENSOR_OUTPUT_META into NVDS_PREPROCESS_BATCH_META in the custom library of nvdspreprocess

Generally speaking, there will be no problem. You can refer to gst_nvinfer_process_tensor_input function in /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp. Alternatively, you can increase the batch-size in the sgie1 configuration file.

john_c · August 18, 2025, 2:28am

Thank you very much! However, the maximum batch size for sgie1 can only be set to 8 due to memory constraints when exporting the engine file using trtexec.

junshengy · August 18, 2025, 2:36am

Refer to this topic for large models, This might help.

Topic		Replies	Views
How to customize nvinfer part of deepstream to process custom data DeepStream SDK	4	771	October 12, 2021
DeepStream Inference on Large Image by Splitting into Smaller Parts DeepStream SDK	2	403	February 28, 2024
Batch size is smaller than number of streams in DS pipeline DeepStream SDK	4	526	October 12, 2021
Mismatch in input tensor batch sizes DeepStream SDK	15	703	March 6, 2024
Output-tensor-meta Access RAW model output with batch dimension DeepStream SDK cuda , deepstream	9	154	September 25, 2025
Deepstream efficient pipeline for tiling DeepStream SDK tensorrt , camera , gstreamer , jetson , deepstream	15	206	December 30, 2025
Increase throughput in pipeline with one primary and two secondary models DeepStream SDK tensorrt , cuda , gstreamer	3	675	November 2, 2021
Inference chaining using Deepstream and Triton DeepStream SDK deepstream	40	686	July 29, 2025
Multiple inputs to tensorRT engine from a single input stream DeepStream SDK tensorrt , gstreamer , inference-server-triton	12	2120	January 4, 2022
Can nvdspreprocess output multiple tensors per call? DeepStream SDK deepstream	4	178	February 28, 2025

How to Split nvinfer Output Tensor (batch_size, n, n, channels, height, width) into batch_size * n * n Sub-Tensors for Subsequent Inference

Related topics