[DetectNet_v2] mAP 0% with custom dataset after full training – TAO Toolkit 5.5.0

train.log (730.8 KB)
experiment_spec.txt (5.7 KB)

Thats the results. Its seems like have a better performance. But its seems like its not so good. 30/40 %

Yes, it is improving.
Please try larger epochs.
Also, you can trigger more experiments on different learning rate.

train.log (1.6 MB)

I set 300 epochs and nothing

{date 652025, time 135337, status S.txt (134.0 KB)

I had to reboot in 2 epochs cause gave me an error.

Is it possible to share your training images for further checking? You can send them via private message to me.

Done. Waiting for your response

Can you check if the md5sum are correct?
After downloading,

$ md5sum *
20a37b6c217b828eb6a7b4d6d915f1e9 Dataset_grande.rar
3a9e92a58ac508e5e2dfe5ec6b1edcf3 Dataset_pequeño.rar

Yes. Its correct

I am also getting the same problem. I have done 50+ trainings with TLT2.0 and TAO3.0 but with TAO5.0 the same pipeline with same images and same resnet18 it does not seem to work. My first problem was that with val_split=14 the tool was not able to do a uniform split across classes, so I randomized the filenames and that issue got solved. But still in the training I could see that at each validation it would still show that the validation data is not even even though tfrecords convert shows that its all balanced. The same dataset with Yolov5 gives me 93.2%

1 Like

Any update morganh?

Could you please try with the 4.0 tao docker? Please docker pull nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5.
Then docker run --runtime -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash.
And run with training command detectnet_v2 train xxx .

Hey! Thanks for the reponse!

{date 6102025, time 82423, status S.txt (114.6 KB)

Thats is the results. I used TAO 4.0.1

The evaluation result is: “average_precision”: {“car”: 55.421, “motorcycle”: 0, “van”: 37.0911}".
Could you run evaluation against the training dataset to check the result?
More, I find that your dataset is some images along with their augmented versions. Some are synthetic images.
And the some labels are not quite good for the coordinate values. For example,

descarga (51)_aug2.jpg
motorcycle 0 0 0 90.81 143.81 747.84 473.36 0 0 0 0 0 0 0

These may also affect mAP.

I will try it