In the 04_video_dec_trt example, how do I print the class of an object other than the bounding box?
I know that resnet_three_class can detect car, motorbike, and people.
How do I detect the bounding box for a person using the onnx file provided as a sample?
I am having difficulty understanding the function that interprets the output of the onnx file provided as a sample.
file name: resnet10_dynamic_batch.onnx
In the resnet10 bounding box parsing function
Why the value of bbox_norm is 35.0
Meaning of gc_centers_0 and gc_centers_1
When specifying the location of output_x1, I don’t understand how to calculate as follows.
Can you explain?
These parameters are used to map the output tensor into a bounding box.
ResNet10 is an internal customized model so the architecture is not available for public.
But it’s very similar to the DetectNet or YOLO. Ex. https://i.stack.imgur.com/aUcNf.jpg
First, you can divide the image into certain grid with size (grid_x, grid_y).
Then the bbox location can be calculated by a offset (ex. output_x1 ) + grid center.
Bbox_norm is a training parameter.
Since output_x1 and grid center may not in a same scale. Bbox_norm is response for the transform.
That means 1.0 in output_x1 equals to 35 in grid center. And 1 in grid center indicates 1 pixel.