DetectNet V2 TAO 5.5 average_precision very low or zero

I tried modifying lots of the params in the spec file, i’m unable to get person average precision more then 2.5% while the other classes stay at 0.
I did analyze the images a bit i have a few images whose bounding box widths heights are lower then 20( i have put 5 in my spec file), please recommend me what to do, any help is appreciated.

• Hardware 2X 3060 TI
• Network Type Detectnet_v2

Can you check if nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 works?
You can
$docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash
then inside the docker, run below.
# detectnet_v2 train xxx

More, is it possible to share more dataset for me to reproduce? Recently some topics are saying the mAP drops with their own dataset but there is no issue with default KITTI dataset mentioned in the notebook.
Thanks.

Hello @Morganh,
I will test out the detectnet train in a bit just a question you want me to run the full command right? detectnet_v2 train with spec file etc.
Please find the dataset attached, this is a small portion
training.zip (45.7 MB)

Yes, please fill in the full command for the xxx. But please note that you can ignore the tao in the beginning.

Thanks. Will check also.

Hello, @Morganh I tried with tao 4.0.1(downloaded the full tao) and tested detectnet v2, i am still having very similar results with low precision

What is the latest result now?
Could you try change

  • batch_size
    batch_size_per_gpu: 1 to batch_size_per_gpu: 4
  • more deeper backbone, for example, resnet50

okay I found the issue @morgan19, the problem was the tf records did not contain any labels except the person label, and i didnt know how to make it work with validation fold, so i used
validation_data_source: {
tfrecords_path: “/workspace/tao-experiments/data/tfrecords/kitti_val/*”
image_directory_path: “/workspace/tao-experiments/data/testing”
},
Ill trained with resnet 18 and i got decent results, thank you

Thanks for the info.

Yes, you can use this way. It is mentioned in the doc as well.

@Morganh my images and labels are sequential meaning person person person fire fire bottle bottle, is there a way to make the tfrecords split random without using validation_data_source?

Also, I tested with onnx model it worked fine, but when i exported to trt format the inference was very bad, am I right to assume
!tao deploy detectnet_v2 gen_trt_engine
-m $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.onnx
–data_type int8
–batches 10
–batch_size 4
–max_batch_size 64
–engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8
–cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin
-e $SPECS_DIR/detectnet_v2_retrain_resnet18_kitti.txt
–results_dir $USER_EXPERIMENT_DIR/experiment_dir_final
–verbose

gen_trt_engine takes sequential images for calibration and not random,
final thing, how can i calculated the number of images in batch(10 batches)
thank you in advance

Please try to use the entire training dataset for calibration.

and how can i use the entire training set for calibration? I want to understand the math when my batch_size is 4 and batches is 10, how many images does that give? 40?
can you answer this too please “my images and labels are sequential meaning person person person fire fire bottle bottle, is there a way to make the tfrecords split random without using validation_data_source?”

Yes.

Please refer to DetectNet_v2 - NVIDIA Docs. The partition mode can be random. Refer to notebook tao_tutorials/notebooks/tao_launcher_starter_kit/detectnet_v2 at main · NVIDIA/tao_tutorials · GitHub as well.