Why does my batchsize affect the output of my inference function?

1057374326 · December 11, 2024, 8:38am

Why can my inference function output more than 200 results when my batch_size is 1, but when I increase my batch_size, my inference results are extremely reduced, and even when I set the batch_size to 128, I can only output one result. Inference result, but it is worth noting that this inference result is the same as the first inference result when batch_size is 1

this is the inference code：
def slide_batches_inference(image_array_batches:np.ndarray ,img_size)->np.ndarray:
“”"推理一个小batch的窗口图像

Args:
    image_array_batches (np.ndarray): [Batch, 1, img_size, img_size] (128,1,1024,1024)的图像
    model: 寒武纪的magic model
    img_size: 推理的图像尺寸

Returns:
    preds (np.ndarray): 模型输出文件，如果是yolov8 obb的话，为 [Batch, 20, 21504], 然后再去做 NMS 后处理

"""

preprocessed_image_array_batches = img_batches_preprocess(image_array_batches, normalization=True)

inputs = preprocessed_image_array_batches # 存了当前 batch 要推理的数据 (batch_size, 1, 1024, 1024)
logger = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(logger)
with open("/JHC/yolov8m_planeship_hbb.plan", "rb") as f:
    model_data = f.read()
engine = runtime.deserialize_cuda_engine(model_data)

创建执行上下文

context = engine.create_execution_context()
# context = engine.create_execution_context()

定义输入和输出大小

input_size = [batch_size, 1, 1024, 1024]


output_size = [image_array_batches.shape[0], 19, 21504]

计算缓冲区大小

input_buffer_size = int(np.prod(input_size) * np.float32().nbytes)
output_buffer_size = int(np.prod(output_size) * np.float32().nbytes)

分配缓冲区

input_buffer = cuda.mem_alloc(input_buffer_size)
output_buffer = cuda.mem_alloc(output_buffer_size)

获取输入和输出张量的名称

input_name = engine.get_tensor_name(0)  # 假设输入张量是第一个绑定
output_name = engine.get_tensor_name(1)  # 假设输出张量是第二个绑定

设置张量地址

context.set_tensor_address(input_name, int(input_buffer))
context.set_tensor_address(output_name, int(output_buffer))

创建 CUDA 流

stream =cuda.Stream()

异步将输入数据从主机拷贝到设备（cudaMemcpyAsync）

# input_data = np.random.rand(*input_size).astype(np.float32)
# input_data = inputs[0].ravel()
input_data = inputs
cuda.memcpy_htod_async(input_buffer, input_data, stream)

执行推理

context.execute_async_v3(stream_handle=stream.handle)

获取输出数据

outputs = np.empty(output_size, dtype=np.float32)

cuda.memcpy_dtoh_async(outputs, output_buffer, stream)
stream.synchronize()


preds = torch.from_numpy(outputs) 


return preds

Thank you very much！！！！

AastaLLL · December 12, 2024, 4:54am

Hi,

with open("/JHC/yolov8m_planeship_hbb.plan", "rb") as f:

Which batchsize (or dynamic?) do you use for building the engine?
Thanks.

1057374326 · December 12, 2024, 8:10am

I just specified batch_size in my input, and used the batch_size parameter when building and opening up space, but batch does not seem to be used when building the engine. I should add some code when building the engine to achieve dynamic batch>1 Inference model input? Thank you very much!

1057374326 · December 12, 2024, 8:40am

import tensorrt as trt

设置日志记录器

logger = trt.Logger(trt.Logger.WARNING)

定义 ONNX 文件路径和输出的 .plan 文件路径

onnx_file_path = “your_model_dynamic.onnx” # 替换为你的 ONNX 模型路径
plan_file_path = “/JHC/dynamic_model.plan” # 替换为保存的 TensorRT 引擎路径

创建 TensorRT Builder、Network 和 Parser

builder = trt.Builder(logger)
network = builder.create_network(flags=trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
parser = trt.OnnxParser(network, logger)

解析 ONNX 模型

print(“Parsing ONNX model…”)
with open(onnx_file_path, “rb”) as f:
if not parser.parse(f.read()):
print(“Failed to parse the ONNX model.”)
for i in range(parser.num_errors):
print(parser.get_error(i))
exit(1)

创建构建配置

config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30) # 设置最大工作空间为 1GB

添加优化配置文件以支持动态 batch

profile = builder.create_optimization_profile()
input_name = network.get_input(0).name
input_shape = network.get_input(0).shape

假设输入形状为 (batch_size, C, H, W)

设置动态 batch 范围，例如 [1, 8]

min_shape = (1, *input_shape[1:])
opt_shape = (4, *input_shape[1:])
max_shape = (8, *input_shape[1:])
profile.set_shape(input_name, min_shape, opt_shape, max_shape)
config.add_optimization_profile(profile)

构建引擎

print(“Building TensorRT engine…”)
engine = builder.build_serialized_network(network, config)

保存 .plan 文件

if engine:
with open(plan_file_path, “wb”) as f:
f.write(engine)
print(f"TensorRT engine saved as {plan_file_path}")
else:
print(“Failed to build the TensorRT engine.”)

this is my onnx to tensorrt code
thank you！

AastaLLL · December 16, 2024, 5:21am

Hi,

To check this issue further, could you attach the source that allows us to reproduce the issue on our end?
Including the ONNX file, inputs, converting, and comparing script.

Thanks.

Topic		Replies	Views
TensorRT Engine batch inference only has one result TensorRT tensorrt	8	1331	June 10, 2020
TensorRT running inference with batch size > 1 TensorRT tensorrt	8	3876	January 26, 2021
TensorRT 5 builder when set max_batch_size to 8 the output shape? Jetson TX2	2	1523	June 5, 2019
Tensorrt inference with batch > 1 TensorRT	4	1509	October 13, 2022
Output of batch inference TensorRT	6	2819	January 7, 2021
Setting the batch in TensorRT using CPP API TensorRT tensorrt	9	1456	January 24, 2025
Batch Inference Wrong in Python API TensorRT	14	3768	February 20, 2020
Tensorrt Batch Inference TensorRT tensorrt	8	1741	December 1, 2020
TensorRT Batch Inferences : empty outputs TensorRT tensorrt , jetson-inference	8	2147	July 18, 2024
Dynamic batch Tensor-RT inference output is incorrect TensorRT tensorrt , python	2	1441	May 25, 2023