Kitti BBox coordinates output

I’ve enabled to save the bbox coordinates (by addint the kitti output export folder in the deepstream config .txt). This outputs a file with the detections and its bounding box, like this one:

person 0.0 0 0.0 1564.00 129.00 1702.00 355.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Here, I understand that the numbers 1564.00 129.00 1702.00 355.00 are the bbox coordinates, ehich is eplaned in the write_kitti function in deepstream_app.c

However these bbox coordinates are different from the video dimentions. In particular, the video I’m working with ins 640x480, and the .mp4 output it saves in sink0 is 1280x720, neither of which coincide with the left 1564.00 and right 1702.00 coordinates. I’m a bit lost right now, can someone explain if I’m missing something?

thanks a lot in advance,


The output is related to the model input size.
Could you share the dimension of your prototxt data layer with us first?


layer {
  name: "deploy_data"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 1
      dim: 3
      dim: 384
      dim: 1248
  include: { phase: TEST not_stage: "val" }


thanks for the quick reply @AastaLLL
Where can I check this?
Im using the objectDetector_Yolo example, with yolov3 with height = 416 and width = 416.



Would you mind to explain more about your use case.

Please noticed that different model architecture has different output bounding box format.
If you are using the YOLO model, the bounding box should be parsed with YOLO parser rather than detectnet.

So are you trying to understand output format of YOLO?
If yes, you can find the parser in our deepstream sample directly.



I’ll be glad to @AastaLLL !
I’m doing tests with my jetson nano to run it to count people that enter and exit a store.

To accomplish this, I need to

  1. process the videos with Yolov3
  2. export the bounding box info
  3. post process it with our own object tracking and counting algorithms. (I do this in a separate python script)

I used the objectDetector_Yolo example, so this accomplishes 1 and 2, by saving the bounding box info in the deepstream config file
[application] gie-kitti-output-dir=/home/ubuntu/kitti_data/

However I’m a bit confused on the bounding box exported to the files. For example

model: Yolov3 (height = width = 416)
input video: .mp4, 640x480
output video: .mp4, 1280x720 (I dont think that this is relevant, but I add it just in case)

Generates a lot of .txt files with the detections. For exampleone detection in one of the files is:

person 0.0 0 0.0 1564.00 129.00 1702.00 355.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0

, where I understand that 1564.00 129.00 1702.00 355.00 are the bbox coordinates. So I’m confused about this. I was expecting the coordinates of the bounding boxes to be in the range of the input video, so that I could “draw” or analice the movement of the people with the elements of the store.
So my question is, how do I take the kitti bbox coordinates to the original input coordinates? This is because I know where is the entrance of the store in the input image (for example, a vertical line at 100 pixels from the left)

Thanks in advance,

I found the problem. It was the width and height parameter in [streammux] in the deepstream configuration file. I guess that this has to do with the output video display or something like that. Setting those parameters to the video width and height achieves what I needed.

thanks for the help!


YES. The bounding box coordinate is based on the display size.

More, Kitti output function is also open-sourced.
You can find the detail in this file:

 * Function to dump bounding box data in kitti format. For this to work,
 * property "gie-kitti-output-dir" must be set in configuration file.
 * Data of different sources and frames is dumped in separate file.
static void
write_kitti_output (AppCtx * appCtx, NvDsBatchMeta * batch_meta)