Training problems with the TAO toolkit using Mask R-CNN

Dear Sir or Madam,

I am currently training sugarcane images collected in the field for detecting diseases on the leaves. A few details as requested are shown as follows.

• Hardware_T4
• Network Type_Mask_rcnn
• TLT Version:
Configuration of the TAO Toolkit Instance

dockers:
nvidia/tao/tao-toolkit-tf:
v3.22.05-tf1.15.5-py3:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification
4. dssd
5. faster_rcnn
6. emotionnet
7. efficientdet
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. yolo_v4_tiny
21. converter
v3.22.05-tf1.15.4-py3:
docker_registry: nvcr.io
tasks:
1. detectnet_v2
nvidia/tao/tao-toolkit-pyt:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. speech_to_text_conformer
4. action_recognition
5. pointpillars
6. pose_classification
7. spectro_gen
8. vocoder
v3.21.11-py3:
docker_registry: nvcr.io
tasks:
1. text_classification
2. question_answering
3. token_classification
4. intent_slot_classification
5. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022

• Training spec file (see attached
maskrcnn_train_resnet50.txt (2.1 KB)
)
• How to reproduce the issue ? (see attached
maskrcnn_sugarcane.ipynb (581.9 KB)
)

As you can see in the uploaded Jupyter notebook, when executing the training command, it shows the error ¨AssertionError: Results do not correspond to current coco set¨.

This error always occurs no matter how I reorganise the data I have. I have confirmed that the dataset has the correct COCO format as I did a test in the Detectron2 and the training and interference were both successful.

I have also checked the class names and their sequence of the training set, which match those of the validation set.

Please help have a look and it will be great if I can use TAO to make a successful training.

Can you run default notebook successfully?

For the error

AssertionError: Results do not correspond to current coco set

See AssertionError: Results do not correspond to current coco set · Issue #1570 · facebookresearch/detectron2 · GitHub ,
the predicted image ids do not match the original image ids.

Thanks for your reply.

The default notebook was run successfully on the COCO dataset.

I have looked at the link you provided as well as a few others before this post. I see that it may be the image id mismatch problem. I have tried many different methods but I still got the same error. Could you indicate how I can solve it?

Thanks.

Can you use only one tfrecord and retry?

More, how many classes in your dataset? Did you set (class + 1) in num_classes ?

Yeah, I can try one tfrecord and retry later. The number of classes has been processed with the plus-one approach as you mentioned.

Hello @JiahongZhao Do you still need support for this topic? Or shall we close it?

Yes, I do need help with this topic. The problem is not solved and I will update my progress later. Thanks.