C++ Yolov3_ONNX output parsing

Hi everyone,

I’m new in using TensorRT C++ API and have only basic understanding in ML, video object detection yet.
Could you please help me to understand Yolov3_ONNX output parsing and correct me if I’m doing something wrong.

Setup settings:
TensorRT-7.2.3.4
CUDA Version: 11.1
CUDNN Version: 8.1
GPU : GeForce GTX 1070
Driver Version: 455.45.01
OS: Ubuntu 20.4

Input data:

  • Trained ONNX model based on Yolov3 and can detect 80 object categories (classes). INT64 weights
  • Car.png file with 3(RGB) channels, size: 1920*1080. Image with 3-4 parked cars.
    The ONNX model works correctly and detects necessary objects via Python API.

Minimal target: detect “car” on the image and draw corresponding rectangle based on BBox coordinates.

To be more simple I’ve used Nvidia example and reworked some stages:
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp

Workflow:

  1. For input ONNX model:
    params.inputTensorNames.push_back(“images”);
    params.outputTensorNames.push_back(“output”);
    params.int8 = args.runInInt8;
  2. These keys methods I’ve left almost w/o changes:
    build()
    constructNetwork(with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH)
    infer()
    No any obvious errors were detected, from parsing, allocating buffers, copy data to/from host

3.Before inference, I do the next changes for input *.png files

  • readImage(“Car.png”, img); //put raw data cv::Mat img
  • cv::resize(img, img, cv::Size(640,640)); //Resize for input model
  • Copy data to host buffer:
    float* data = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
    before copying I used code from:
    bool SampleUffSSD::processInput(const samplesCommon::BufferManager& buffers)
    https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleUffSSD/sampleUffSSD.cpp#L300
    // The color image to input should be in BGR order
  1. After infer(), checked output buffer:
    float* detectionOut = static_cast<float*>(buffers.getHostBuffer(mParams.outputTensorNames[0]));

Due to Yolov3 standard, the output buffer would contain boxes with size = 85.
first 4 elements:bx by bw bh
5th :conf_score
next 80 elements: c1…c80
I assume that resized image 640x640 would be scaled 3times:
Scale 3: 80x80, Scale 2: 40x40, Scale 1: 20x20.
Total output buffer should have size: (80x80 + 40x40 + 20x20)x3 = 25200/85 = ~296 BBoxes
For example there my output for first 3 Yolov3 elements:

[04/22/2021-12:32:17] [I] 0.542306 0.226851 0.401792 -0.0513536 -9.70514 -4.29494 -7.36099 -3.78555 -6.73828 -7.58386 -7.02508 -7.22057 -7.14941 -6.59306 -4.49437 -7.69277 -7.09905 -6.83414 -6.77486 -4.23565 -6.7856 -7.68969 -6.54489 -6.25913 -6.16793 -7.42629 -7.36532 -7.71079 -7.59555 -6.5745 -5.74356 -5.7933 -3.92326 -7.17957 -6.48325 -7.32181 -7.39555 -5.01554 -7.02647 -6.51238 -6.39299 -7.13209 -7.4009 -7.34513 -2.04712 -4.89446 -4.0352 -4.79804 -3.47983 -4.34923 -4.44538 -2.68177 -3.18095 -4.60184 -4.12383 -4.65686 -4.27735 -3.73344 -5.49286 -3.5088 -3.85077 -5.30183 -8.09564 -5.53019 -7.78349 -6.10046 -7.30674 -6.53994 -7.16183 -6.03646 -4.68715 -7.20229 -2.66324 -7.42531 -7.31424 -7.70524 -6.13099 -7.55631 -4.52403 -4.10598 -6.12587 -4.86591 -5.60973 -6.58145 -4.73593
[04/22/2021-12:32:17] [I] 0.2243 0.0609286 1.52773 -0.210024 -9.40563 -4.28177 -7.34746 -3.55497 -6.96295 -7.3322 -7.02723 -7.20567 -7.06806 -6.59262 -4.18148 -7.65241 -6.88469 -6.73907 -6.80788 -4.10895 -6.77128 -7.58193 -6.40886 -6.49587 -6.36937 -7.77715 -7.45838 -7.65253 -7.82763 -6.81406 -5.41136 -6.12642 -3.74187 -7.29304 -6.58143 -7.09211 -7.49611 -5.23522 -6.82205 -6.64073 -6.86235 -7.17414 -7.21994 -7.16324 -2.51332 -4.71412 -3.79096 -4.70023 -3.61342 -4.28137 -4.28752 -2.7099 -3.14824 -4.44628 -3.945 -4.80816 -4.15773 -3.63639 -5.10903 -3.33197 -3.79971 -5.52187 -8.14032 -5.6509 -7.71476 -5.89993 -7.39266 -6.58958 -7.11467 -6.12327 -5.03268 -7.14123 -3.19617 -7.50392 -7.55485 -7.69442 -5.9289 -7.75157 -4.76684 -3.73088 -6.13474 -4.8205 -5.70967 -6.60677 -4.78483
[04/22/2021-12:32:17] [I] 0.208864 -0.302209 1.63829 -0.470203 -8.98034 -3.83819 -7.13928 -2.95598 -7.12368 -7.35233 -7.37897 -7.64733 -6.97566 -6.82093 -4.38897 -7.48784 -6.93448 -6.23299 -6.83732 -4.06812 -7.0693 -7.85028 -6.90694 -6.87629 -6.67206 -8.22955 -7.74801 -7.87238 -7.8432 -6.32431 -5.45075 -6.17067 -3.72249 -7.59581 -5.87988 -6.91 -7.42819 -4.28345 -6.63567 -5.95419 -7.12676 -7.66316 -7.29064 -7.0552 -2.16251 -5.23319 -3.30637 -5.15527 -4.11985 -4.81657 -4.71451 -2.64847 -3.25204 -5.39585 -4.3747 -5.31162 -4.71635 -4.47144 -6.02104 -4.43249 -5.00056 -5.4139 -8.37647 -5.44382 -7.96077 -6.66438 -7.29994 -7.00947 -7.82105 -6.38725 -4.59354 -7.64563 -3.55436 -7.72774 -8.21339 -7.92636 -6.23117 -7.71688 -4.42084 -4.28869 -6.01326 -5.10915 -5.90131 -6.14067 -3.84633

I’ve tried to handle obtained data by using code from:
bool SampleUffSSD::verifyOutput(const samplesCommon::BufferManager& buffers)
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleUffSSD/sampleUffSSD.cpp#L323
I think it was the wrong approach by using SSD model output handling instead of Yolov3.
As a result I’ve got rectangles with incorrect positions that do not match cars.
In any case it was the most close result, because any examples from various Yolov3 and C++ projects stuck with my output obtained float data.

Questions:

  1. Maybe someone came across a similar task, the data in the output buffer look real ?
    Otherwise, I think preprocessing would be wrong, please correct me.
  2. In TensorRT documentation references only on example with Python+Yolov3(Onnx) model:
    Sample Support Guide :: NVIDIA Deep Learning TensorRT Documentation
    Unfortunately, I didn’t find any appropriate example with using C++ API + Onnx
    Could you please provide a link to source code example with parsing output data for Yolov3_ONNX by using TensorRT C++ API ?

I will be grateful for any help or explanations.

Hi @anton.nesterenko ,
Request you to share your onnx model and the script with us so that we can try this at our end.
Also we believe during the meantime you can check the document for the same

Thanks!

Hi AakankshaS,

Thank you for reply.

The input onnx file is large for attaching to the current topic.
Please, share with me instruction how I can provide you the file via link or Google Drive

Regarding to “share …script with us”, could you please clarify this point.
Do you expect a binary executable file from me ?

Currently, the main code based on /sampleOnnxMNIST.cpp , only code for pre/post processing data was customized.
The input data: image and onnx file names are temporary hardcoded.

Dear AakankshaS,

  1. Please check your email’s box, I’ve provided necessary inputs.
  2. Moreover, I’ve observed some Warnings in console:

[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while > TensorRT does not natively support INT64. Attempting to cast down to INT32.

[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[04/29/2021-11:45:18] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
Would it has impact on my data from host buffer after inference ?

Currently, I use TensorRT-7.2.3.4 and OnnxParameters for network would be:
samplesCommon::OnnxSampleParams Struct Reference
params.int8 = args.runInInt8;
or
params.fp16 = args.runInFp16;
My network is running in FP16 mode but ONNX model cast down only on INT32.
I’ve used "args.runInInt8; " as well, but no significant changes.

Thank you for support,

Hi @anton.nesterenko,

Sorry for the delayed response. Above warnings are related to cast down to INT32.
I don’t think you’ll be losing accuracy unless the value is actually out of range of INT32.
Regarding samples, we do not have c++ based Yolo sample. But in Deepstream, tensorrrt c++ yolo sample is available. Hope following will help you.

There is a C++ based example in Deepstream SDK.

/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/

Also just for your reference, How To Run Inference Using TensorRT C++ API | LearnOpenCV

Thank you.