YOLOv4 TensorRT inference results wayy off, but onnxruntime is not


Hello TRTians! :)

I have a custom YOLOv4 object detector (single class only) that was trained in Pytorch, then exported to ONNX followed by me trying to get it running on my laptop GPU by using the trt engine exported by the trtexec binary. My ultimate goal is to get this running on a Jetson Xavier, but I need to demonstrate that this works correctly on a laptop GPU first.

I am loading this trt serialized bin into my C++ inference code (and onnx too for network definitions), but the outputs (confidence scores and bounding boxes) produced by the C++/TensorRT inference is wayy off, to the point that none of the confidence scores are crossing even the 0.1 mark, effectively not detecting any object at all.

The corresponding ONNX model loaded using “onnxruntime” is producing correct inference results with the exact same image input (preprocessed input tensor dumped from the C++ inference code), without any errors or warnings. So the onnx file seems to be correct.

The only additional preprocessing that I am doing in the python onnxruntime code is expanding the dim: (3,416,416) → (1,3,416,416) . I dont know if a similar concept exists in the C++ counterpart, for a copying 1 batch input into a float pointer.

The C++ inference code is a modified version of the sampleOnnxMNIST example, minus the network optimizations and serialization part.

Here are the additional details and attachments for the same


TensorRT Version:
GPU Type: GTX 1060:
Nvidia Driver Version: 470.129.06:
CUDA Version: 10.2:
CUDNN Version: 8:
Operating System + Version: Ubuntu 18.04:
Python Version: python 3.8:
PyTorch Version1.8:
Baremetal/no docker:

Relevant Files

I am attaching a linkhere: google drive to the following files:

  1. model onnx file
  2. The trtexec log with --verbose flag,
  3. a part of the C++ inference code utilizing the serialized trt engine binary.
  4. The onnxruntime script to validate the onnx file and the inference outputs.
  5. A sample image png file and the corresponding Numpy input image tensor dump after applying postprocessing.

Can the good people of Nvidia/Internet help me figure out what I am doing wrong? :)

Best regards

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging


The ONNX model is already a part of the link that I had shared. check_model.py is a part of the onnxruntime_test_nvidia_forum.py uploaded in the same link.

UPDATE: I realized that the input was in HWC format (default in OpenCV). I am now transposing the input to be of CHW format. The inputs going into the onnxruntime and tensorRT are now verified to be identical.

Still not getting any results from the TensorRT version. 😭


Could you please share updated scripts to try from our end.

Thank you.

Hi @spolisetty

Command to generate the trt engine binary:

./trtexec --onnx=yolov4_obj_det.onnx --saveEngine=yolov4_lp_engine.trt --best --workspace=4000 --verbose

Link to the updated C++ inference code: here

Link to the onnx model: onnx_model

Link to the onnxruntime script: onnxruntime_test_nvidia_forum.py

For the above script, you’d need the tensor dump in npy that can be found here: imgcv.npy

All of these and everything else (trtexec logs, test image etc) can be found here: Google Drive folder

The snippet where I am preprocessing the data:

bool YOLOv4TensorRT::processInput(const samplesCommon::BufferManager &buffers, const cv::Mat &img)
    const int inputH = mInputDims.d[2];
    const int inputW = mInputDims.d[3];
    cv::Mat rgb, bgr_sized;
    // std::vector<cv::Mat> rgb_int(3);
    cv::Mat rgbf;
    cv::Mat inpTensor; // = cv::Mat(cv::Size(img.cols, img.rows), CV_32FC3);

    printf("input dimension: %d, %d\n", inputW, inputH);
    cv::resize(img, bgr_sized, cv::Size(inputW, inputH));
    cv::cvtColor(bgr_sized, rgb, cv::COLOR_BGR2RGB);
    rgb.convertTo(rgbf, CV_32FC3);
    inpTensor = rgbf / 255.0;

    float *hostDataBuffer = static_cast<float *>(buffers.getHostBuffer("input"));
    if (hostDataBuffer == nullptr)
        printf("ERROR: could not locate input tensor by name\n");
        return false;

    printf("# of elements: %d\n", inputW * inputH * inpTensor.elemSize());
    transpose((float*)inpTensor.data, hostDataBuffer, inputW, inputH, 3); // HWC -> CHW
    std::vector<unsigned long> shape = {inputH, inputW, 3};
    dumpNPY<float>("imgcv.npy", shape, (float*)inpTensor.data);
    memcpy((char *)hostDataBuffer, (char *)inpTensor.data, inputW * inputH * inpTensor.elemSize());

    return true;

void transpose(const float* src, float* dst, int W, int H, int C)
    // input shape (416, 416, 3) -> (3, 416, 416)
    for(size_t c = 0; c < C; c++)
        for(size_t h = 0; h < H; h++)
            for(size_t w = 0; w < W; w++)
                dst[w + (W*h) + (H*W*c)] = src[c + (w*C) + (W*C*h)];

Alright, there was a bug in my preprocessing code (duplicate memcpy that was overwriting the correct input). The C++ tensorrt is now matching the onnxruntime results! Closing the issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.