Detectnet_v2 training failure using Tao Toolkit on DeepStream

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc): T1000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I am training the detectnet_v2 network, but during execution in the log I see that the network is not learning, the mAP during the 120 epochs remains at 0.0%, below are the training log files and .txt configuration file for training,

log_treinamento_02_06.txt (452.0 KB)
detectnet_v2_treinamento_fragmaq_spec.txt (9.3 KB)

The resolution of the training images is 960 x 544, I already tried training at 640 x 640, but it didn’t work,

Can you run evaluation again with lower minimum_bounding_box_height: 20?
What is the average of the training images? Is it possible to share an example? More, is it too small for the objects?
Also, can you share the log when you generate tfrecords files? Since you are training 5 classes and all the training images are only 120, I need to check how many samples in each class of evaluation dataset and training dataset.

I managed to carry out the training with a 960x 544 image dataset and with the following configuration:

detectnet_v2_treinamento_fragmaq_spec.txt (9.9 KB)

Soon after, I carried out another training with another dataset maintaining the same configuration parameters and it didn’t work, in all epochs it returns 0 mAP for each object class, this is the result of the training,

log_train.txt (762.1 KB)

The dataset used for training has 950 training images and 424 test images, all images are at a resolution of 960x544, which do you think could be the fault? I already tried training with a pre-trained resnet10 network, but it didn’t give good results.

Could you set lower minimum_bounding_box_height? For example, minimum_bounding_box_height:4. Then run evaluation again.

Hello, I did a new training with the parameter minimum_bounding = 4, and the training didn’t work, so I did the training again with this parameter set to 2, and the result log is in the .txt file, if you have any other parameters to change and test again, let me know

log_train_16_06.txt (635.1 KB)
detectnet_v2_treinamento_fragmaq_spec.txt (9.9 KB)

May I know the tfreocrds logs? Also, could you please several training images along with their label files? Thanks.

Below is the log of the conversion of images to binary, and a .zip file with the images and labels (in .json format)

log_dataset_convert.txt (190.9 KB)
train.zip (48.0 MB)

The images were resized to 960x544 together with the labels

Can you run evaluation with the same dataset as training to narrow down?
I will also check further as well.

Change to:

  validation_data_source: {
    tfrecords_path: "/home/renato/treinamento_feira_objects/dataset_convert/-fold-000-of-001-shard-*"
    image_directory_path: "/home/renato/treinamento_feira_objects"
  }

More, please run experiments on 4.0.1 docker to narrow down.
docker run --runtime -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash .
And run with training command detectnet_v2 train xxx .

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.