Hi everyone,
I’m new in using TensorRT C++ API and have only basic understanding in ML, video object detection yet.
Could you please help me to understand Yolov3_ONNX output parsing and correct me if I’m doing something wrong.
Setup settings:
TensorRT-7.2.3.4
CUDA Version: 11.1
CUDNN Version: 8.1
GPU : GeForce GTX 1070
Driver Version: 455.45.01
OS: Ubuntu 20.4
Input data:
- Trained ONNX model based on Yolov3 and can detect 80 object categories (classes). INT64 weights
- Car.png file with 3(RGB) channels, size: 1920*1080. Image with 3-4 parked cars.
The ONNX model works correctly and detects necessary objects via Python API.
Minimal target: detect “car” on the image and draw corresponding rectangle based on BBox coordinates.
To be more simple I’ve used Nvidia example and reworked some stages:
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp
Workflow:
- For input ONNX model:
params.inputTensorNames.push_back(“images”);
params.outputTensorNames.push_back(“output”);
params.int8 = args.runInInt8; - These keys methods I’ve left almost w/o changes:
build()
constructNetwork(with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH)
infer()
No any obvious errors were detected, from parsing, allocating buffers, copy data to/from host
3.Before inference, I do the next changes for input *.png files
- readImage(“Car.png”, img); //put raw data cv::Mat img
- cv::resize(img, img, cv::Size(640,640)); //Resize for input model
- Copy data to host buffer:
float* data = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
before copying I used code from:
bool SampleUffSSD::processInput(const samplesCommon::BufferManager& buffers)
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleUffSSD/sampleUffSSD.cpp#L300
// The color image to input should be in BGR order
- After infer(), checked output buffer:
float* detectionOut = static_cast<float*>(buffers.getHostBuffer(mParams.outputTensorNames[0]));
Due to Yolov3 standard, the output buffer would contain boxes with size = 85.
first 4 elements:bx by bw bh
5th :conf_score
next 80 elements: c1…c80
I assume that resized image 640x640 would be scaled 3times:
Scale 3: 80x80, Scale 2: 40x40, Scale 1: 20x20.
Total output buffer should have size: (80x80 + 40x40 + 20x20)x3 = 25200/85 = ~296 BBoxes
For example there my output for first 3 Yolov3 elements:
[04/22/2021-12:32:17] [I] 0.542306 0.226851 0.401792 -0.0513536 -9.70514 -4.29494 -7.36099 -3.78555 -6.73828 -7.58386 -7.02508 -7.22057 -7.14941 -6.59306 -4.49437 -7.69277 -7.09905 -6.83414 -6.77486 -4.23565 -6.7856 -7.68969 -6.54489 -6.25913 -6.16793 -7.42629 -7.36532 -7.71079 -7.59555 -6.5745 -5.74356 -5.7933 -3.92326 -7.17957 -6.48325 -7.32181 -7.39555 -5.01554 -7.02647 -6.51238 -6.39299 -7.13209 -7.4009 -7.34513 -2.04712 -4.89446 -4.0352 -4.79804 -3.47983 -4.34923 -4.44538 -2.68177 -3.18095 -4.60184 -4.12383 -4.65686 -4.27735 -3.73344 -5.49286 -3.5088 -3.85077 -5.30183 -8.09564 -5.53019 -7.78349 -6.10046 -7.30674 -6.53994 -7.16183 -6.03646 -4.68715 -7.20229 -2.66324 -7.42531 -7.31424 -7.70524 -6.13099 -7.55631 -4.52403 -4.10598 -6.12587 -4.86591 -5.60973 -6.58145 -4.73593
[04/22/2021-12:32:17] [I] 0.2243 0.0609286 1.52773 -0.210024 -9.40563 -4.28177 -7.34746 -3.55497 -6.96295 -7.3322 -7.02723 -7.20567 -7.06806 -6.59262 -4.18148 -7.65241 -6.88469 -6.73907 -6.80788 -4.10895 -6.77128 -7.58193 -6.40886 -6.49587 -6.36937 -7.77715 -7.45838 -7.65253 -7.82763 -6.81406 -5.41136 -6.12642 -3.74187 -7.29304 -6.58143 -7.09211 -7.49611 -5.23522 -6.82205 -6.64073 -6.86235 -7.17414 -7.21994 -7.16324 -2.51332 -4.71412 -3.79096 -4.70023 -3.61342 -4.28137 -4.28752 -2.7099 -3.14824 -4.44628 -3.945 -4.80816 -4.15773 -3.63639 -5.10903 -3.33197 -3.79971 -5.52187 -8.14032 -5.6509 -7.71476 -5.89993 -7.39266 -6.58958 -7.11467 -6.12327 -5.03268 -7.14123 -3.19617 -7.50392 -7.55485 -7.69442 -5.9289 -7.75157 -4.76684 -3.73088 -6.13474 -4.8205 -5.70967 -6.60677 -4.78483
[04/22/2021-12:32:17] [I] 0.208864 -0.302209 1.63829 -0.470203 -8.98034 -3.83819 -7.13928 -2.95598 -7.12368 -7.35233 -7.37897 -7.64733 -6.97566 -6.82093 -4.38897 -7.48784 -6.93448 -6.23299 -6.83732 -4.06812 -7.0693 -7.85028 -6.90694 -6.87629 -6.67206 -8.22955 -7.74801 -7.87238 -7.8432 -6.32431 -5.45075 -6.17067 -3.72249 -7.59581 -5.87988 -6.91 -7.42819 -4.28345 -6.63567 -5.95419 -7.12676 -7.66316 -7.29064 -7.0552 -2.16251 -5.23319 -3.30637 -5.15527 -4.11985 -4.81657 -4.71451 -2.64847 -3.25204 -5.39585 -4.3747 -5.31162 -4.71635 -4.47144 -6.02104 -4.43249 -5.00056 -5.4139 -8.37647 -5.44382 -7.96077 -6.66438 -7.29994 -7.00947 -7.82105 -6.38725 -4.59354 -7.64563 -3.55436 -7.72774 -8.21339 -7.92636 -6.23117 -7.71688 -4.42084 -4.28869 -6.01326 -5.10915 -5.90131 -6.14067 -3.84633
I’ve tried to handle obtained data by using code from:
bool SampleUffSSD::verifyOutput(const samplesCommon::BufferManager& buffers)
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleUffSSD/sampleUffSSD.cpp#L323
I think it was the wrong approach by using SSD model output handling instead of Yolov3.
As a result I’ve got rectangles with incorrect positions that do not match cars.
In any case it was the most close result, because any examples from various Yolov3 and C++ projects stuck with my output obtained float data.
Questions:
- Maybe someone came across a similar task, the data in the output buffer look real ?
Otherwise, I think preprocessing would be wrong, please correct me. - In TensorRT documentation references only on example with Python+Yolov3(Onnx) model:
Sample Support Guide :: NVIDIA Deep Learning TensorRT Documentation
Unfortunately, I didn’t find any appropriate example with using C++ API + Onnx
Could you please provide a link to source code example with parsing output data for Yolov3_ONNX by using TensorRT C++ API ?
I will be grateful for any help or explanations.