NAN during training on custom dataset

nic_wren · January 19, 2021, 9:22am

I have created an image dataset with augmentation and XML annotations of the translated images with my own tool but during training I only ever get NAN on the training accuracies.
I have checked the XML creations by using a dataset created with Nvidia camera-capture tool and swapping out the XML for mine, all works well.

I have checked the train/val/testval/test txt documents by again using my software to generate them. If these are incorrect you don’t really get to the training section it complains far earlier :-)

The JPEG images all seem to be formatted correctly and have the correct sizes detailed in the XML and the bounding boxes are correct.

What else can I look at please that could be causing this issue
A similar set of images captured with the camera-capture tool all work fine.

Thank you

AastaLLL · January 20, 2021, 2:30am

Hi,

Suppose there is an image library used in your training framework.
The common library is pillow or OpenCV.

To confirm any issue in the dataset, you can try to dump the input of the network.
And draw the corresponding bbox with the same library from the training frameworks (ex. TensorFlow, PyTorch, …).

Thanks.

nic_wren · January 20, 2021, 8:43am

I use OpenCV to generate the images

attached is one of my images and the associated XML annotation, I have also attached an image of the output.

ps000001x1y4fx.xml (592 Bytes)

A snippet of the code for performing the translation and storing the image.

#Capture image area
translation_matrix = np.float32([[1,0,-xsampleloc], [0,1,-ysampleloc]])
shiftedimage = cv2.warpAffine(imagecapture, translation_matrix, (imagecapture.shape[1], imagecapture.shape[0]))
zoneimage=shiftedimage[0:capture_image_height,0:capture_image_width]

#Extract image size
img_width = zoneimage.shape[1]
img_height = zoneimage.shape[0]
#Write Annotation XML data on image size
node_width.text = str(img_width)
node_height.text = str(img_height)

#>> Store the JPG
cv2.imwrite(datadirectory + dataname + ‘/JPEGImages/’ + basefilenametext + ‘x’ + str(ximgcount) + ‘y’ + str(yimgcount) + ‘f0.jpg’, zoneimage)

Thanks for your help.

nic_wren · January 26, 2021, 1:29pm

Have I provided the correct information you require? Thanks

nic_wren · January 27, 2021, 9:42pm

Is it possible to get a little help on this issue please - I’m now completely out of ideas - thanks

dusty_nv · January 28, 2021, 1:36am

Hi @nic_wren - have you tried decreasing the learning rate? Does it work without the additional augmentation/warping that you add to the dataset?

nic_wren · January 29, 2021, 10:53pm

Sorry dusty but how do i alter the learning rate it dosent seem to be covered in the tutorials.

dusty_nv · January 30, 2021, 1:54am

There is a --learning-rate argument to train_ssd.py:

https://github.com/dusty-nv/pytorch-ssd/blob/e7b5af50a157c50d3bab8f55089ce57c2c812f37/train_ssd.py#L56

You can try reducing it from it’s default of 0.01 to 0.001 instead and see if that helps.

nic_wren · January 30, 2021, 10:55am

Ok had a bit of a session today on this…

there in front of my face is how to adjust the learn rate within the output of you software so sorry for being so blind. This must frustrate you all sorry.
learn rate didn’t have an effect.
i reduced my image sets down to just flips without any positional stepping and everything worked.

My conclusion at the moment is the training doesn’t like the black areas of my images when i do the positional stepping. Is there something i should be putting here? I have seen papers on doing just this and assumed the black areas wouldn’t have been a problem.

Thanks for all your help your doing a wonderful job.

dusty_nv · February 1, 2021, 5:12pm

Hmm I have not tried the pytorch-ssd with just pure black areas of the images before, they always had some data in them, so I’m not sure why it is leading to NaN like that. Perhaps you could instead variably crop the image so that the surrounding area was still filled with the original data?

By the way, it looks like pytorch-ssd already applies data augmentation to the training data:

So you may want to consider the augmentation transforms it is already applying in addition to your own. Actually it looks like it already doing the random sample cropping that I mentioned above, and flipping (mirroring).

nic_wren · February 1, 2021, 9:42pm

Thanks dusty. In my edited method I did crop to the edges of the original image removing the black boarders.
Its odd you say that pytorch-ssd already augmented the input data as after i got my augmentation working it really worked well at training my models from just a few original images. But thanks, now i know what it does do i can select some different augmentations to try and perhaps improve it further. Its been a useful learning experience anyway. Thanks again.