TensorRT Segmentation output


I am testing out a segmentation model made in tensorflow with input shape of [B, 256,256,7] and output shape of [B,256,256,4]. It was a basic UNET model.

When converting to TensorRT, the model output is of the shape [B, 1024,16,16] which can be reshaped to [B,256,256,4] but the output is just noise. I have attached a snippet of the code below which was used for inference with TensorRT.

The tensorflow model was converted to ONNX using TF2ONNX. One error was reported here where the ONNX model did not have an output layer, so I had set one when converting to TensorRT.


TensorRT Version:
GPU Type: 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores
Nvidia Driver Version: 525
CUDA Version: 12.3
CUDNN Version:
Operating System + Version: Ubuntu 20.04, aarch64/arm64
Python Version (if applicable): 3.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

def tensorrt_model_inference_v1(model_path, image_list, batch_size, image_size=(-1, 256,256,7)):
    image_tensor = np.stack(image_list) # 1,3,256,256

    with open(model_path, "rb") as f, trt.Runtime(TRT_LOGGER) as model_file:
        engine = model_file.deserialize_cuda_engine(f.read())

    print("[INFO] Starting the execution of the model")

    context = engine.create_execution_context()
    context.set_binding_shape(0, image_tensor.shape)

    nInput = np.sum([engine.binding_is_input(i) for i in range(engine.num_bindings)]) # Number of input pixels
    nOutput = engine.num_bindings - nInput # number of output pixels. Here each layer apart from the input layer are considered as output layers

    print("[INFO] Creating the buffer for CUDA")
    bufferH = []
    bufferH.append(np.ascontiguousarray(image_tensor.reshape(-1))) # Making the input image tensor as one continuous array

    # Allocating empty list space to the output layers for the host. Inputs and results are stored in this array
    for i in range(nInput, nInput + nOutput):
        bufferH.append(np.empty(context.get_binding_shape(i), dtype=trt.nptype(engine.get_binding_dtype(i))))

    # Creating the inference list. TensorRT will use this list for inference on the input image tensor.
    bufferD = []
    for i in range(nInput + nOutput):

    # Copying in memory from the host list to the inference/device list
    for i in range(nInput):
        cuda_v1.cuMemcpyHtoD(bufferD[i], bufferH[i].ctypes.data, bufferH[i].nbytes)

    context.execute_v2(bufferD) # Executing on the inference/device list

    # Copying the results from the inference/device list to the output region of the host/input list
    for i in range(nInput, nInput + nOutput):
        cuda_v1.cuMemcpyDtoH(bufferH[i].ctypes.data, bufferD[i], bufferH[i].nbytes)
    for b in bufferD:
    print("[INFO] Finished executing the model.") # Include a timestamp
    # print(len(bufferH))

    output_0 = np.array(bufferH[0])
    output_1 = np.array(bufferH[1])
    # output_1 = np.exp(output_1)/(1+np.exp(output_1))

    print("[INFO] Model output shape : ", end="")
    # print(output_1.shape)

    # output_1 = postproces_results(model_output=output_1, batch_size=output_1.shape[0])
    output_1 = output_1.reshape(-1,256,256,4)

    # print(output_1.shape)
    # print(output_1.shape)
    # print(output_1)
    return output_1

Hi @gotamdahiya ,
Can you please help us with your onnx model