CVAT Annotation : Multiple Detection In The Same Bounding Box Not Working


I already annotate 900 pictures via CVAT and most of the pictures I annotate are as shown below;

I have 2 extra bounding boxes inside the bounding box for the face, which are purposely to detect nose and mouth. However, after the datasets have been trained (30 epochs) and run, detectnet unable to boxed (detect) nose and mouth as shown in the picture below;

Double bounding boxes

The next issue I have is, multiple bounding box exist when it detect something as shown in picture above. There are 2 and sometimes 3 or more bounding boxes for “No_Mask” exist and they are overlapped.

For now, I have no idea what caused this issue, but there are a few things that I think might lead to this issue;

  • Low quality image used for the datasets
  • Need more pictures for my datasets
  • Inaccurate annotations
  • CVAT parameters are not assign properly, as shown in picture below. I have no idea do I need to input anything for the ‘Overlap size’ and ‘Segment size’ in order for the re-trained model to be able to detect mouth and nose. Previously, I did not assign any value to this parameters.

CVAT parameters

Hope any of you guys can help and suggest anything to solve this problem :c

Thank you.


Which training frameworks and sample do you use?
In general, the overlapped bounding box issue can be fixed by no maximal suppression when parsing output.


I am sorry, I am not sure which training frameworks and sample I am using. But most of the steps/procedures that I am referring are totally based on jetson-inference/ at master · dusty-nv/jetson-inference · GitHub .

The only difference from the guide is, instead of collecting the dataset via jetson-inference tools (video stream), I obtained my dataset (pictures) from the internet and annotate it using CVAT. The rest are the same as shown in the video “Jetson AI Fundamentals - S3E5 - Training Object Detection Models”.

Hi @faiz26, first you can test the on some of your example images to see if it is able to detect nested bounding boxes. That script uses the PyTorch pth model checkpoints, so it would tell you if the model/dataset is capable of detecting these objects or not.

If that works, then it would mean the bounding-box clustering would need adjustments on the jetson-inference code in detectNet. You can see the clustering code here:

Currently the clustering rejects two different objects on top of each other (it would consider that a false positive). You can try changing lines 914-929 of code to below:

if( detections[n].ClassID == detections[m].ClassID )
	detections[m].Confidence = fmaxf(detections[n].Confidence, detections[m].Confidence);

Also you can change the default clustering threshold here (this is the percentage overlap)

If you make changes to the C++ code, remember to re-run make && sudo make install


The ‘’ return this,

Inference time: 12.817248344421387
Found 2 objects. The output image is run_ssd_example_output.jpg

May I know what is the command to preview the image? Because the jpg file did not exist in the file explorer view.

Thank You.


The run_ssd_example_output.jpg should output to the workspace folder directly.
If it doesn’t exist, do you get any error message related to this?


I have no errors previously, just I don’t know how to open/view the jpg file via the terminal.

Is it because I run the ’’ command inside the docker container caused this issue to happen?

Aha, ok - yes, in that case it would be inside the docker container. What you can do, is copy the run_ssd_example_output.jpg image thats inside your container to the ssd/data folder, which is mapped to your host device. So you would then see it from your host.

Run this inside the container:

cd jetson-inference/python/training/detection/ssd
# re-run so that run_ssd_example_output.jpg is made
cp run_ssd_example_output.jpg data

Then outside container, if you navigate your file explorer to jetson-inference/python/training/detection/ssd/data, you should see the image there.

1 Like


It works, I manage to open and view the test image. Thank You!

But, as shown in the pictures below, the test results shows that the nose and mouth are not detected (no bounding box).


Is it due to mistakes that I made when configuring the CVAT annotations setting parameters? or something else?

Or, instead of I annotate the nose and mouth inside the face bounding box simultaneously, maybe I can create another sets of datasets that will not annotate any faces but only nose and mouth.

Thank you.

You shouldn’t have to set those overlap parameters in the CVAT tool, I don’t think those get used in the Pascal VOC format anyways. I wonder if the overlapping bounding boxes are getting filtered out during inference with the Non-Maximal Suppression clustering:

Regardless, I haven’t personally trained models that have small detections inside of larger detections like this, so I’m not sure what changes are required. So you can try changing the NMS clustering or making it into a separate dataset.

1 Like

Owh okay, thank you for the information regarding the CVAT parameter tools.

The main aim of my project are to detect the use of face mask, which are refers to the ‘Masked’ and ‘No_Mask’ labels. The mouth, hand, and nose detection are just some extra features, that I think will be good to have for a better verification.

So far, the ‘Masked’ and ‘No_Mask’ are working perfectly except the ovelapped bounding box issue as mention earlier in this discussion (later on, I will try the method that have been suggested by AastaLLL to solve this issue). I think this is already sufficient for my project. Anyways, thank you for your assistance, much appreciated.