[DetectNet_v2] mAP 0% with custom dataset after full training – TAO Toolkit 5.5.0

Hi,

I’ve been trying to train a custom object detection model using DetectNet_v2 in TAO Toolkit 5.5.0, but after a full 80-epoch training process, I always get 0.0% mAP in evaluation. I’ve double-checked everything (dataset, preprocessing, config files, input sizes, etc.) and even tried exporting and running inference with the trained model, but the outputs are still invalid. I just tried with a lot of configurations but nothing.

I’m reaching out for support because I believe I’ve followed the documentation correctly and tried all the common fixes without success. Below is my system setup and attached are the full config files, logs, dataset examples, and TFRecords for full reproducibility.

System Setup

  • OS: Ubuntu 22.04.5 LTS
  • GPU: NVIDIA GeForce RTX 4070 SUPER (12GB)
  • CUDA: 12.9
  • Driver: 576.28
  • TAO Toolkit version: 5.5.0
  • Running environment: WSL2 with virtualenv and Docker
  • TAO invoked with: tao CLI inside TAO virtualenv

Problem Summary

  • Dataset based on KITTI format, with 3 classes: car, motorcycle, and van.
  • All images resized to 480x272 (multiples of 16) and RGB 3-channel format.
  • TFRecords successfully created and verified.
  • Spec files follow official documentation, using resnet backbone.
  • Training finishes normally with no errors.
  • But evaluation returns 0.0 mAP for all classes, even after 80 epochs.
  • Exported ONNX model produces empty detections on inference.

Help Requested

Could anyone from the community or NVIDIA team help me understand why I’m getting 0.0% mAP even though the training runs correctly and the dataset seems to be in order?

Am I missing something subtle in the specs or preprocessing? I’m happy to provide any additional detail or test any suggestions.

I can provide the dataset if u want.

Thanks in advance!

kitticonf.txt (532 Bytes)
modelconf.txt (5.6 KB)
convert.log (9.2 KB)
train.log (560.7 KB)

As you can see, just car have 17%. I dont know what can i do . Maybe more epochs?

Commands Used

These are the exact commands I used to prepare the dataset and start training:

Convert KITTI to TFRecords

tao model detectnet_v2 dataset_convert \
  -d /workspace/tao-experiments/vehdet_3/kitticonf.txt \
  -o /workspace/tao-experiments/vehdet_3/tfrecords/tfrecords \
  --gpus 1 \
  --num_processes 1 \
  --log_file /workspace/tao-experiments/vehdet_3/convert.log \
  --results_dir /workspace/tao-experiments/vehdet_3/results \
  -v

tao model detectnet_v2 train \
  --gpus 1 \
  --num_processes 1 \
  -e /workspace/tao-experiments/vehdet_3/modelconf.txt \
  -r /workspace/tao-experiments/vehdet_3/results_train8 \
  -n vehdet3 \
  -v

train.log (737.0 KB)

This is another train log with the same all but changing the epochs to 300. I saw that didnt change anything so i stopped the train.

train.log (1.5 MB)
Another with 320 epochs

The loss keeps decreasing. It seems to be normal.

Can you set lower minimum_bounding_box_height: 20 ? Since your model is set to
output_image_width: 480
output_image_height: 272

I am afraid the objects are small. So, if set minimum_bounding_box_height: 20, most of the objects in evaluation dataset will not be evaluated at all.

After modifying the spec file, you can run detectnet_v2 evaluate xxx directly. Not need to train again.

Hi again,

After applying your suggestions and modifying the evaluator section in the spec file, I re-ran the evaluation using:

tao model detectnet_v2 evaluate \
  -e /workspace/tao-experiments/vehdet_3/modelconf.txt \
  -m /workspace/tao-experiments/vehdet_3/results_train9/weights/vehdet3.hdf5 \
  --gpu_index 0 \
  --log_file /workspace/tao-experiments/vehdet_3/results_train9/evaluate.log \
  -v

However, the results in the new evaluate.log file still show 6% mAP across all classes, same as before.

evaluate.log (111.3 KB)

My concern is that even after modifying an important parameter (like the evaluator), the model seems not to be re-trained or improved at all. The training logs look correct, and I’ve confirmed all images and labels are properly formatted and resized to meet TAO requirements (480x272, RGB, KITTI format converted using TAO tools).

In a previous test, I trained the model for 300 epochs without changing the configuration or any command, and it resulted in mAP = 0. It didnt result in mAP=6.
Thanks in advance for your help!

Do you mean you resized your training images to 480x272 offline before training? Can you check the resolution of car/van objects? Is it too small? More, what is the original resolution of the images?
More, you did not set correct pretrained model. For resnet50( num_layers: 50), you need to download the resnet50 hdf5 file instead of pretrained_model_file: "/workspace/tao-experiments/vehdet_3/pretrained/resnet_18.hdf5"

Yes. I resized my training images to 480x272. I have various datasets. One have 1280x720, another 259x194, etc So i decided to resized all to the minimum. Here are examples.



Its true. Thats was a mistake use resnet18. I just retrained my model with the correct pretrained. I will share the results.

train.log (394.4 KB)
modelconf.txt (5.6 KB)

thats the train config and the log.


And thats an example of my images with the dimensions that i use for train

Any idea?

May I know why there are bboxes around the car and license plates in your training images? Is it expected?

Yes. Thats because I use a trained model with just car and take snapshots for a future use and train my own model. The bb on this photos dont mind anything. Its just a snapshot of an old ALPR.

OK, and also in your convert.log, actually there is not any objects in evaluation dataset.
See below log.

Wrote the following numbers of objects:
b'car': 300
b'van': 85

In training dataset, there are below objects.

Wrote the following numbers of objects:
b'van': 615
b'motorcycle': 777
b'car': 510

So, it is expected to get AP=0 for motorcycle because there is not any motorcycle in evaluation dataset.

Suggest to run evaluation against the training dataset.

More, you can set lower batch-size to train.

Also, suggest to train a larger size to narrow down.
output_image_width: 960
output_image_height: 544

Last, also modify below.

  target_classes {
    name: "motorcycle"
    class_weight: 1.0
    coverage_foreground_weight: 0.05
    objectives {
      name: "cov"
      initial_weight: 1.0
      weight_target: 1.0
    }
    objectives {
      name: "bbox"
      initial_weight: 10.0
      weight_target: 1.0   --> change to 10.0
    }
  }

I modify my modelconf.txt and change the batch to 4 and output img to 960x544.

Finally I modify weight target too in motorcycle class.

Thats the final
modelconf.txt (5.6 KB)

With the new dimension, I just create again tfrecords and its the same. There are no motorcycle in the evaluation. Why is it?

2025-06-03 08:18:26,759 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 1
2025-06-03 08:18:26,766 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 2
2025-06-03 08:18:26,772 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 3
2025-06-03 08:18:26,777 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 4
2025-06-03 08:18:26,783 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 5
2025-06-03 08:18:26,790 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 6
2025-06-03 08:18:26,795 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 7
2025-06-03 08:18:26,801 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 8
2025-06-03 08:18:26,807 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 0, shard 9
2025-06-03 08:18:26,814 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250:
Wrote the following numbers of objects:
b’car’: 300
b’van’: 85

2025-06-03 08:18:26,814 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 0
2025-06-03 08:18:26,836 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 1
2025-06-03 08:18:26,860 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 2
2025-06-03 08:18:26,883 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 3
2025-06-03 08:18:26,906 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 4
2025-06-03 08:18:26,929 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 5
2025-06-03 08:18:26,951 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 6
2025-06-03 08:18:26,974 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 7
2025-06-03 08:18:26,996 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 8
2025-06-03 08:18:27,017 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 166: Writing partition 1, shard 9
2025-06-03 08:18:27,039 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250:
Wrote the following numbers of objects:
b’van’: 615
b’motorcycle’: 777
b’car’: 510

2025-06-03 08:18:27,039 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 89: Cumulative object statistics
2025-06-03 08:18:27,039 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.dataio.dataset_converter_lib 250:
Wrote the following numbers of objects:
b’car’: 810
b’van’: 700
b’motorcycle’: 777

I just put the train and in 20 epochs we have nothing… as you can see in the txt.

train.log (243.8 KB)

Im going to let finish the train completly. And upload the file again.

train.log (999.7 KB)
{date 632025, time 81944, status ST.txt (54.1 KB)

Nothing. mAP = 0

Thats an example of the img

Since your dataset has various resolution and you have not resized them to 960x544 offline yet, please add enable_auto _resize in below.

  preprocessing {
       ...
       enable_auto _resize: true
  }

Refer to DetectNet_v2 - NVIDIA Docs.

More, to narrow down, please run evaluation directly against the training dataset.
You can change validation_fold: 0 to below.

validation_data_source: {
    tfrecords_path: " <path to tfrecords to validate on>/tfrecords validation pattern>"
    image_directory_path: " <path to validation data source>"
}

I resized the dataset offline before the train. So is it necessary to set enable_auto_resize?

Yes, please set. Also, to narrow down, please use resnet18 to train.
Download hdf5 from below link.
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_detectnet_v2/version
Not forget to set num_layers: 18

train.log (90.1 KB)

I dont know why i have this error in the validation

experiment_spec.txt (5.7 KB)

I just modify the model to layer 18 and enable auto resize.

I just relaunch the train and its on it.