Bounding box offset error with provided camera-capture tool

Im trying to create my own datasets by augmenting images (simple flips and translations etc) and in doing so have written my own dataset viewer to check my work. In doing so I have found the data created by the camera-capture tool seems to store all its bounding box data with a slight offset. I have verified this with other tools. If your creating boundaries around smaller objects this must be having an impact on the end quality of the training.

Hi,

Could you share some information about how do you get the bounding box?
You will only get the raw image data from the camera-capture tool.

Thanks.

The camera-capture tool allows you to draw the bounding boxes and specify whats in the box for training detection. This then stores the full image and an xml annotation. My software and the other i have tried steps through each captured image and retrieves the xml data drawing the box back onto the image for checking. I did this to make sure that my own capture tool was correctly writing the xml data. To verify it i used datasets from the camera-capture and thus found the issue.
Please also note that I find the tool also some times writes an xml that has all the data but the object data and box which is really odd. I had to write my software to correctly account for these ommisions.

Hi @nic_wren, thanks for pointing this out - can you tell if the bounding boxes are offset by a constant amount, or is it variable?

When I made the tool, I double-checked the output coordinates by opening the saved image file in GIMP and tracing the same bounding box, and they were correct. Can you also share what camera resolution you are using?

Also, if there are no bounding boxes in the image, it can save the XML with no box data. I check against this in the PyTorch training script so it ignores XML files with no bounding boxes.

The box is consistently shifted down and to the right. This has happened on two resolutions of camera a 1920:1080 and a 1280:960 I have also verified it with another developers tool called “Image Set viewer 0.5” and the results are consistent with mine. If you want my python code for the viewer I wrote you are more than welcome to have it (just be aware its a quick bodge together I did to verify the other work I was doing so it isn’t pretty or friendly)
I found this as I’m developing a small item detector on screws and its very clear when the bounding box isn’t aligned.
I have attached an image not a good one I know but it clearly shows the offsetScreenshot from 2021-01-19 16-19-10

I noticed the XML with no data for a bounding box after making small datasets to verify against and I must admit I’m not 100% but >80% sure I didn’t save one without a box - I will however keep my eye on it on future sets of data and will create another report if im 100% in another Topic post.

Has this information been of any help?

Thanks @nic_wren - sorry for the delay - is the offset (5,5) pixels by chance? If so, my guess is the source is this offset applied to the camera image (so that the camera image is fully viewable and isn’t cutoff by the window border):

It doesn’t appear that I have accounted for this offset in the detection bounding boxes, sorry about that :(

If the offset in your dataset is indeed (5,5) pixels, then you could try to fix it by either changing cameraOffsetX and cameraOffsetY to 0 (in captureWindow.h). Or you could subtract 5 from each of the bounding box coordinates here:

If that fixes it, let me know, and I will merge the change in the master branch.
After you make changes to the code, re-run make and sudo make install in your jetson-inference/build directory.

To fix your existing dataset, you could either add code to the pytorch_ssd scripts to apply the correction there, or write a script that changed the XML. Here is where the VOC-format bounding boxes are loaded in PyTorch:

Looking at that, I just found that there appears to be additional offset applied of -1 pixel, because apparently VOC pixel indexing starts with 1 (not 0) - which appears to be an additional issue. So camera-capture may need to apply a -4 pixel offset: (1 - cameraOffset)

Correct I estimated it to be 5 pixels out and had already adjusted for it in my data viewer application so I could correct the XML directly. -4 doesn’t seem to be correct but I will take another look. I was just wondering if I was correct. Good luck with the edit. Thanks

I can double confirm the offset is 5 pixels not 4 - edited my viewer and created a tight box on a hole.

The 1-pixel offset is applied at the PyTorch stage (making it 4), because apparently in VOC pixel indices start with 1 and not 0. So when it is training, it is offset by an additional 1 pixel:

https://github.com/dusty-nv/pytorch-ssd/blob/e7b5af50a157c50d3bab8f55089ce57c2c812f37/vision/datasets/voc_dataset.py#L139

To correct for this in your data viewer, your data viewer would also want to treat the upper-left corner of image as x=1, y=1 (instead of x=0, y=0) and apply the 1 pixel offset.

Ah penny dropped thanks so I need to account for this in my data capture software I have been writing to augment my image sets. However it then means that I think other software that perform bounding box captures don’t do this pixel offset,. I will investigate when I have some time.