Question about Python tutorial

tzl · September 11, 2021, 1:01am

Description

I was trying to extend the example at TensorRT/tutorial-runtime.ipynb at master · NVIDIA/TensorRT · GitHub with batch support. I believe I did everything right, but out of a batch, only first item was assigned meaningful values. Other items in the output stayed 0.

Environment

TensorRT Version: 8.0.1.6
GPU Type: Tesla T4
Nvidia Driver Version: 450.119.03
CUDA Version: In container cuda-11.4
CUDNN Version:
Operating System + Version: ec2 instance
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Container nvcr.io/nvidia/tensorrt:21.08-py3

Steps To Reproduce

When invoking trtexec to convert the onnx model, I set shapes to allow a range of batch sizes.

trtexec --onnx=fcn-resnet101.onnx --explicitBatch --fp16 --workspace=5200 --minShape
s=input:1x3x1026x1282 --optShapes=input:2x3x1026x1282 --maxShapes=input:4x3x1026x1282 --buildOnly --saveEngine=fcn-resnet101.trt

I’d stack a batch of images together

BATCH_SIZE = 4
batch = np.stack([input_image] * BATCH_SIZE)

# In [21]: batch.shape
# Out[21]: (4, 3, 1026, 1282)

After context creation, I set binding shape:

model_path = "fcn-resnet101.trt"
print("Reading engine from file {}".format(model_path))
with open(model_path, "rb") as f, trt.Runtime(trt.Logger(trt.Logger.WARNING)) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())

context = engine.create_execution_context()
context.set_binding_shape(engine.get_binding_index("input"), (BATCH_SIZE, 3, image_height, image_width))

The call succeeds with True.

Then I run inference and evaluate the resulting array:

bindings = []
for binding in engine:
    binding_idx = engine.get_binding_index(binding)
    size = trt.volume(context.get_binding_shape(binding_idx))
    dtype = trt.nptype(engine.get_binding_dtype(binding))
    if engine.binding_is_input(binding):
        input_buffer = np.ascontiguousarray(batch)
        input_memory = cuda.mem_alloc(batch.nbytes)
        bindings.append(int(input_memory))
    else:
        output_buffer = np.empty([BATCH_SIZE, size], dtype)
        output_memory = cuda.mem_alloc(output_buffer.nbytes)
        bindings.append(int(output_memory))

stream = cuda.Stream()
# Transfer input data to the GPU.
cuda.memcpy_htod_async(input_memory, input_buffer, stream)
# Run inference
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
# Transfer prediction output from the GPU.
cuda.memcpy_dtoh_async(output_buffer, output_memory, stream)
# Synchronize the stream
stream.synchronize()

However it seems only the first item was assigned correctly. Last three stayed 0.

In [24]: output_buffer[0][output_buffer[0] > 1]
Out[24]: array([15, 15, 15, ..., 15, 15, 15], dtype=int32)

In [25]: output_buffer[1][output_buffer[1] > 1]
Out[25]: array([], dtype=int32)

In [26]: output_buffer[2][output_buffer[2] > 1]
Out[26]: array([], dtype=int32)

In [27]: output_buffer[3][output_buffer[3] > 1]
Out[27]: array([], dtype=int32)

spolisetty · September 13, 2021, 6:36am

Hi @tzl,

Please refer the following doc to check about working with dynamic shape inputs and optimization profiles.
Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Thank you.

tzl · September 13, 2021, 2:38pm

Thank you. I solved the problem.

output_buffer = np.empty([BATCH_SIZE, size], dtype)

should have been

output_buffer = np.empty([BATCH_SIZE, size // BATCH_SIZE], dtype)

Topic		Replies	Views
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1099	December 13, 2022
I can't get result from TensorRT model TensorRT tensorrt	8	1017	May 31, 2022
Multiple tensorrt engine contexts for different models TensorRT	3	1876	March 16, 2023
Work with batch in TensorRT TensorRT tensorrt , opencv , cuda , tensorflow	20	3838	July 20, 2021
TensorRT Segmentation output TensorRT tensorrt , cudnn , onnx	1	349	March 14, 2024
Setting the batch in TensorRT using CPP API TensorRT tensorrt	9	1321	January 24, 2025
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5424	June 29, 2022
How to use different profile in tensorrt? TensorRT tensorrt , python	3	1413	July 19, 2022
Trtexec and dynamic batch size TensorRT	4	5434	July 22, 2021
Dynamic batch size for tensorrt Engine TensorRT tensorrt	1	1387	May 30, 2024

Question about Python tutorial

Description

Environment

Steps To Reproduce

Related topics