In the 04_video_dec_trt example, how do I print the class of an object other than the bounding box?

Hello,

  1. In the 04_video_dec_trt example, how do I print the class of an object other than the bounding box?

  2. I know that resnet_three_class can detect car, motorbike, and people.
    How do I detect the bounding box for a person using the onnx file provided as a sample?

Thank you.

Hi,
Do you mean classification? The reference model does detection only. For classification, you may need to apply the other model.

The reference model is for demonstration. If you have different usecase, would need to check if there is other model fitting your usecase.

1 Like

Hello,

I am having difficulty understanding the function that interprets the output of the onnx file provided as a sample.
file name: resnet10_dynamic_batch.onnx

In the resnet10 bounding box parsing function

  1. Why the value of bbox_norm is 35.0
  2. Meaning of gc_centers_0 and gc_centers_1
  3. When specifying the location of output_x1, I don’t understand how to calculate as follows.
    Can you explain?

The output part of onnx file is as follows.

Thank you.

Hi,

These parameters are used to map the output tensor into a bounding box.
ResNet10 is an internal customized model so the architecture is not available for public.
But it’s very similar to the DetectNet or YOLO. Ex. https://i.stack.imgur.com/aUcNf.jpg

First, you can divide the image into certain grid with size (grid_x, grid_y).
Then the bbox location can be calculated by a offset (ex. output_x1 ) + grid center.

  1. Bbox_norm is a training parameter.
    Since output_x1 and grid center may not in a same scale. Bbox_norm is response for the transform.
    That means 1.0 in output_x1 equals to 35 in grid center. And 1 in grid center indicates 1 pixel.

  2. Center position of each grid cell.

  3. Please check the explanation above.

Thanks.