YOLOv4 TensorRT inference results wayy off, but onnxruntime is not

shubhamp · May 30, 2022, 9:21am

Description

Hello TRTians! :)

I have a custom YOLOv4 object detector (single class only) that was trained in Pytorch, then exported to ONNX followed by me trying to get it running on my laptop GPU by using the trt engine exported by the trtexec binary. My ultimate goal is to get this running on a Jetson Xavier, but I need to demonstrate that this works correctly on a laptop GPU first.

I am loading this trt serialized bin into my C++ inference code (and onnx too for network definitions), but the outputs (confidence scores and bounding boxes) produced by the C++/TensorRT inference is wayy off, to the point that none of the confidence scores are crossing even the 0.1 mark, effectively not detecting any object at all.

The corresponding ONNX model loaded using “onnxruntime” is producing correct inference results with the exact same image input (preprocessed input tensor dumped from the C++ inference code), without any errors or warnings. So the onnx file seems to be correct.

The only additional preprocessing that I am doing in the python onnxruntime code is expanding the dim: (3,416,416) → (1,3,416,416) . I dont know if a similar concept exists in the C++ counterpart, for a copying 1 batch input into a float pointer.

The C++ inference code is a modified version of the sampleOnnxMNIST example, minus the network optimizations and serialization part.

Here are the additional details and attachments for the same

Environment

TensorRT Version: 8.2.5.1:
GPU Type: GTX 1060:
Nvidia Driver Version: 470.129.06:
CUDA Version: 10.2:
CUDNN Version: 8:
Operating System + Version: Ubuntu 18.04:
Python Version: python 3.8:
PyTorch Version1.8:
Baremetal/no docker:

Relevant Files

I am attaching a linkhere: google drive to the following files:

model onnx file
The trtexec log with --verbose flag,
a part of the C++ inference code utilizing the serialized trt engine binary.
The onnxruntime script to validate the onnx file and the inference outputs.
A sample image png file and the corresponding Numpy input image tensor dump after applying postprocessing.

Can the good people of Nvidia/Internet help me figure out what I am doing wrong? :)

Best regards

NVES · May 30, 2022, 9:37am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

shubhamp · May 30, 2022, 9:41am

Hi NVES,

The ONNX model is already a part of the link that I had shared. check_model.py is a part of the onnxruntime_test_nvidia_forum.py uploaded in the same link.

shubhamp · May 30, 2022, 4:01pm

UPDATE: I realized that the input was in HWC format (default in OpenCV). I am now transposing the input to be of CHW format. The inputs going into the onnxruntime and tensorRT are now verified to be identical.

Still not getting any results from the TensorRT version. 😭

spolisetty · June 2, 2022, 5:22am

Hi,

Could you please share updated scripts to try from our end.

Thank you.

shubhamp · June 2, 2022, 7:50am

Hi @spolisetty

Command to generate the trt engine binary:

./trtexec --onnx=yolov4_obj_det.onnx --saveEngine=yolov4_lp_engine.trt --best --workspace=4000 --verbose

Link to the updated C++ inference code: here

Link to the onnx model: onnx_model

Link to the onnxruntime script: onnxruntime_test_nvidia_forum.py

For the above script, you’d need the tensor dump in npy that can be found here: imgcv.npy

All of these and everything else (trtexec logs, test image etc) can be found here: Google Drive folder

The snippet where I am preprocessing the data:

bool YOLOv4TensorRT::processInput(const samplesCommon::BufferManager &buffers, const cv::Mat &img)
{
    const int inputH = mInputDims.d[2];
    const int inputW = mInputDims.d[3];
    cv::Mat rgb, bgr_sized;
    // std::vector<cv::Mat> rgb_int(3);
    cv::Mat rgbf;
    cv::Mat inpTensor; // = cv::Mat(cv::Size(img.cols, img.rows), CV_32FC3);

    printf("input dimension: %d, %d\n", inputW, inputH);
    cv::resize(img, bgr_sized, cv::Size(inputW, inputH));
    cv::cvtColor(bgr_sized, rgb, cv::COLOR_BGR2RGB);
    rgb.convertTo(rgbf, CV_32FC3);
    inpTensor = rgbf / 255.0;



    float *hostDataBuffer = static_cast<float *>(buffers.getHostBuffer("input"));
    if (hostDataBuffer == nullptr)
    {
        printf("ERROR: could not locate input tensor by name\n");
        return false;
    }

 
    printf("# of elements: %d\n", inputW * inputH * inpTensor.elemSize());
    
    transpose((float*)inpTensor.data, hostDataBuffer, inputW, inputH, 3); // HWC -> CHW
    
#ifdef DUMP_OUTPUT
    std::vector<unsigned long> shape = {inputH, inputW, 3};
    dumpNPY<float>("imgcv.npy", shape, (float*)inpTensor.data);
#endif
    memcpy((char *)hostDataBuffer, (char *)inpTensor.data, inputW * inputH * inpTensor.elemSize());

    return true;
}

void transpose(const float* src, float* dst, int W, int H, int C)
{
    // input shape (416, 416, 3) -> (3, 416, 416)
    for(size_t c = 0; c < C; c++)
    {
        for(size_t h = 0; h < H; h++)
        {
            for(size_t w = 0; w < W; w++)
            {
                dst[w + (W*h) + (H*W*c)] = src[c + (w*C) + (W*C*h)];
            }
        }
    }
}

shubhamp · June 7, 2022, 1:25pm

Alright, there was a bug in my preprocessing code (duplicate memcpy that was overwriting the correct input). The C++ tensorrt is now matching the onnxruntime results! Closing the issue.

system · June 21, 2022, 1:25pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Incorrect inference results after converting from ONNX to TRT with trtexec TensorRT tensorrt , python , onnx	4	1707	December 9, 2022
Tensorrt 8.6 GA : C++ Inference gives diffrence results compared to onnx \|\| pt model python inference TensorRT	3	750	September 20, 2023
TensorRT 10.1: Different inference results of onnxruntime and tensorrt TensorRT	2	249	August 21, 2024
TensorRT 8 : C++ inference gives different results compared to tensorflow python inference TensorRT	7	1466	October 5, 2021
Output from ONNX inference and trt inference are different Jetson TX2 tensorrt , tensorflow , nvbugs	6	951	October 18, 2021
Big difference between infer results of onnxruntime and tensorrt TensorRT cudnn	3	234	May 8, 2025
Incorrect inference results - onnx/pytorch/tensorrt/c++ on Xavier AGX Jetson AGX Xavier tensorrt , jetson-inference , pytorch	13	1553	March 22, 2022
TensorRT model inference result is not correctly TensorRT tensorrt , tensorflow , onnx	1	700	July 1, 2022
YOLO v4 inference with TensorRT after training with TLT 3.0 TensorRT tensorrt , yolo , python	8	2635	October 12, 2021
Onnx vs tensorrt different inference result TensorRT	3	3707	November 29, 2022

YOLOv4 TensorRT inference results wayy off, but onnxruntime is not

Description

Environment

Relevant Files

check_model.py

Related topics