• Hardware A5000
• Network Type Mask_rcnn
• TLT Version nvidia/tao/tao-toolkit-tf: v3.22.05-tf1.15.5-py3
• Training spec file
tao_maskrcnn_02_09_24_train.txt (2.4 KB)
• How to reproduce the issue ?
While I run the training it exits with “NaN loss during training.” refer the training log for the complete console out
train.log (19.9 KB)
I refered the following thread Mask R-CNN stops abruptly while training using custom coco dataset - #13 by Morganh and removed “images” and “annotations” entries where “segmentation” was found to be in json format, there were’nt any empty “segmentation” entries.
Additionally this is the snippet from the console out when I generated the tfrecords using the tao mask_rcnn dataset_convert
command
INFO:tensorflow:writing to output path: /workspace/tao-experiments/dataset/coco_val_2017/tfrecords_maskrcnn/instances_val2017_car_truck_bus INFO:tensorflow:writing to output path: /workspace/tao-experiments/dataset/coco_val_2017/tfrecords_maskrcnn/instances_val2017_car_truck_bus INFO:tensorflow:Building bounding box index. INFO:tensorflow:Building bounding box index. INFO:tensorflow:0 images are missing bboxes. INFO:tensorflow:0 images are missing bboxes. INFO:tensorflow:On image 0 of 691 INFO:tensorflow:On image 0 of 691 INFO:tensorflow:On image 100 of 691 INFO:tensorflow:On image 100 of 691 INFO:tensorflow:On image 200 of 691 INFO:tensorflow:On image 200 of 691 INFO:tensorflow:On image 300 of 691 INFO:tensorflow:On image 300 of 691 INFO:tensorflow:On image 400 of 691 INFO:tensorflow:On image 400 of 691 INFO:tensorflow:On image 500 of 691 INFO:tensorflow:On image 500 of 691 INFO:tensorflow:On image 600 of 691 INFO:tensorflow:On image 600 of 691 INFO:tensorflow:Finished writing, skipped 0 annotations.
INFO:tensorflow:Finished writing, skipped 0 annotations.
You can find the messge 0 images are missing bboxes.
I am also attaching the json for you to further explore
instances_val2017_car_truck_bus.txt (3.7 MB)
Thanks