TensorRT Segmentation output

gotamdahiya · March 11, 2024, 3:05pm

Description

I am testing out a segmentation model made in tensorflow with input shape of [B, 256,256,7] and output shape of [B,256,256,4]. It was a basic UNET model.

When converting to TensorRT, the model output is of the shape [B, 1024,16,16] which can be reshaped to [B,256,256,4] but the output is just noise. I have attached a snippet of the code below which was used for inference with TensorRT.

The tensorflow model was converted to ONNX using TF2ONNX. One error was reported here where the ONNX model did not have an output layer, so I had set one when converting to TensorRT.

Environment

TensorRT Version:
GPU Type: 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores
Nvidia Driver Version: 525
CUDA Version: 12.3
CUDNN Version:
Operating System + Version: Ubuntu 20.04, aarch64/arm64
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

def tensorrt_model_inference_v1(model_path, image_list, batch_size, image_size=(-1, 256,256,7)):
    
    image_tensor = np.stack(image_list) # 1,3,256,256

    with open(model_path, "rb") as f, trt.Runtime(TRT_LOGGER) as model_file:
        engine = model_file.deserialize_cuda_engine(f.read())
        pass

    print("[INFO] Starting the execution of the model")

    context = engine.create_execution_context()
    context.set_binding_shape(0, image_tensor.shape)

    nInput = np.sum([engine.binding_is_input(i) for i in range(engine.num_bindings)]) # Number of input pixels
    nOutput = engine.num_bindings - nInput # number of output pixels. Here each layer apart from the input layer are considered as output layers

    print("[INFO] Creating the buffer for CUDA")
    bufferH = []
    bufferH.append(np.ascontiguousarray(image_tensor.reshape(-1))) # Making the input image tensor as one continuous array

    # Allocating empty list space to the output layers for the host. Inputs and results are stored in this array
    for i in range(nInput, nInput + nOutput):
        bufferH.append(np.empty(context.get_binding_shape(i), dtype=trt.nptype(engine.get_binding_dtype(i))))
        pass

    # Creating the inference list. TensorRT will use this list for inference on the input image tensor.
    bufferD = []
    for i in range(nInput + nOutput):
        bufferD.append(cuda_v1.cuMemAlloc(bufferH[i].nbytes)[1])
        pass

    # Copying in memory from the host list to the inference/device list
    for i in range(nInput):
        cuda_v1.cuMemcpyHtoD(bufferD[i], bufferH[i].ctypes.data, bufferH[i].nbytes)
        pass

    context.execute_v2(bufferD) # Executing on the inference/device list

    # Copying the results from the inference/device list to the output region of the host/input list
    for i in range(nInput, nInput + nOutput):
        cuda_v1.cuMemcpyDtoH(bufferH[i].ctypes.data, bufferD[i], bufferH[i].nbytes)
        
    for b in bufferD:
        cuda_v1.cuMemFree(b)
        pass
    print("[INFO] Finished executing the model.") # Include a timestamp
    
    # print(len(bufferH))

    output_0 = np.array(bufferH[0])
    output_1 = np.array(bufferH[1])
    # output_1 = np.exp(output_1)/(1+np.exp(output_1))

    print("[INFO] Model output shape : ", end="")
    # print(output_1.shape)

    # output_1 = postproces_results(model_output=output_1, batch_size=output_1.shape[0])
    output_1 = output_1.reshape(-1,256,256,4)

    # print(output_1.shape)
    # print(output_1.shape)
    # print(output_1)
    return output_1
    pass

AakankshaS · March 14, 2024, 6:02am

Hi @gotamdahiya ,
Can you please help us with your onnx model

Thanks

Topic		Replies	Views
ONNX model and TensorRT engine works differently TensorRT	5	755	February 20, 2023
TensorRT engine gives garbage output TensorRT	1	983	February 10, 2020
Tensorrt fails shapeMachine.cpp TensorRT tensorrt , cudnn	2	419	February 16, 2024
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1104	December 13, 2022
Segmentation fault (core dumped) after run IExecutionContext.execute_async_v3() TensorRT cudnn	2	32	March 31, 2025
How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches? TensorRT tensorrt , cuda , ubuntu , python , cudnn , deep-learning	2	99	December 2, 2024
Multiple tensorrt engine contexts for different models TensorRT	3	1878	March 16, 2023
TensorRT output full of NaN TensorRT	1	445	October 19, 2023
:nvinfer1::rt::ExecutionContext::enqueueInternal::330, condition: bindings[x] != nullptr TensorRT tensorrt	1	1892	February 15, 2022
Error occurred while running the Tensorrt samples: [reformat.cpp::executeCutensor::385] TensorRT tensorrt	3	1211	December 12, 2023

TensorRT Segmentation output

Description

Environment

Relevant Files

Related topics