TensorRT Engine gives incorrect inference output for segmentation model

Hello.
I am trying to convert tf-keras segmentation model to tensorrt engine and perform inference on it. The tensorrt engine gives incorrect/noisy result output. I have tried porting classification model and it works. The problem is with tensorflow models only.

Following are the steps I followed:

  1. Convert the model to ONNX model using tf2onnx using following command:

python -m tf2onnx.convert --saved-model "Path_To_TF_Model" --output “Path_To_Output_Model\Model.onnx” --verbose
I performed inference on this onnx model using onnxruntime in python. It gives correct output.

  1. This ONNX model I converted to TensorRT engine using following command:
    trtexec.exe --onnx=“Patrh_To_ONNX_Model\model.onnx” --saveEngine=“PAth_To_TRT_Engine\model.engine” --verbose --explicitBatch

  2. I performed inference on this tensorrt engine in c++. Inference outputs are all wrong and noisy.

There are no errors or warnings in the verbose output while generating the engine. In fact the onnx model inference gives the correct output. So I am confused now.

I can share my code, verbsose outputs, tf and onnx models as well.

Platform: Windows 10
CUDA : 11.1
TensorRT version : 7.2.2
CuDNN version : 8.0.4

Please let me know if you need any other input from my side.

1 Like

Hi, Request you to share your model and script, so that we can help you better.

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

@NVES
Hello,
Sharing you paths to all the files I am using:
TF_Keras Model - https://drive.google.com/file/d/1SEMiNhG6XibA5dRAv3UvBfWyY-UoU0JA/view?usp=sharing

ONNX Model - https://drive.google.com/file/d/1DfNDFsyVBBbNrEsvkEGaiCYxGYzo-bCM/view?usp=sharing

Input Image - Input.png - Google Drive

Expected output - ExpectedOutput.png - Google Drive

Python code to run onnxruntime :

        tile = cv2.imread(tileName)
        tile = tile.astype("float32") / 255

        onnx_model = onnx.load("PathToONNX_Model/tf2onnx.onnx ")

        content = onnx_model.SerializeToString()
        session = onnxruntime.InferenceSession(content)
        io_binding = session.io_binding()
        io_binding.bind_cpu_input('zero_padding2d_20_input:0', data)
        io_binding.bind_output('Identity:0')
        session.run_with_iobinding(io_binding)

        prob = io_binding.copy_outputs_to_cpu()[0]
        prediction = np.uint8(np.argmax(prob[0], axis=1))
        prediction = np.uint8((255 * prediction.astype('float32')) / (classes - 1))
        prediction = np.reshape(prediction, (imheight, imwidth))

        cv2.imwrite(dataPaths + tileNum + '_onnxOp11Out' + labelExt, prediction)
        print('Done : ' + dataPaths + resultDir + tileNum + labelExt)

My CPP Code for Inference:
TRT_Inference.cpp (6.1 KB)

Hi @sandeep.ganage,

Could you please try by changing following code block in TRT_Inference.cpp

auto input_width = dims.d[2];
auto input_height = dims.d[1];
auto channels = dims.d[0];
auto input_size = cv::Size(input_width, input_height);

to

auto input_width = dims.d[3];
auto input_height = dims.d[2];
auto channels = dims.d[1];
auto input_size = cv::Size(input_width, input_height);

Thank you.

@spolisetty
Actually for caffe model I used this same index sequence and it worked. But yes, that was a mistake in this case.

I updated the index sequence but the inference output is still incorrect.

Hello @spolisetty
I made those changes. Also I found out that there was additional softmax operation other than model’s softmax operation. I removed that code and rerun the code. It is still giving me incorrect inference result.

Please find my updated code:
NVIDIA_Dev_Forum.cpp (6.9 KB)

Hi,

The issue is in user’s code.

Apply this diff we can get the correct result:

121c121
<               if (cpu_output[ii*2] >= cpu_output[ii*2+1]) {
---
>               if (cpu_output[ii] >= cpu_output[ii+(512*512)]) {

The output is 1x262144x2 rather than 1x2x262144.

Thank you.