DetectNetV2 Model on Custom KITTI Dataset (Single Class) Producing Invalid Detections – Need Help with Accuracy Improvement

• Hardware (L40)
• Network Type (Detectnet_v2)
• Toolkit_version (5.5.0)
• I trained a custom object detection model using NVIDIA TAO Toolkit’s DetectNetV2, following the official tutorial provided here:
👉 DetectNetV2 TAO Tutorial

Dataset:

Despite the large dataset and following the recommended training procedure, I’m seeing invalid or incorrect detections during inference — even when I set the confidence threshold as low as 0.0001.

Request:

I’d appreciate guidance on:

  • Whether this might be due to a need for hyperparameter tuning
  • If so, which parameters (e.g., learning rate, augmentation, batch size, etc.) should I focus on
  • Any known issues with single-class KITTI datasets and DetectNetV2
  • Suggestions for improving detection accuracy or stability

I’m also attaching my detectnet_v2_train_resnet18_kitti.txt spec file for reference. Am using the same in Deepstream Pipeline.

Looking forward to your suggestions and support from the community. Thanks in advance!

Please add enable_auto _resize: true to preprocessing.

  preprocessing {
    output_image_width: 832
    output_image_height: 256
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
    enable_auto _resize: true
  }


(from DetectNet_v2 — Tao Toolkit)

More, set to lower minimum_bounding_box_height: 4
minimum_height: 4
minimum_width: 4

More experiment, you can run with the 4.0 tao docker. Please docker pull nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 .
Then docker run --runtime -it --rm nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash .
And run with training command detectnet_v2 train xxx .

Hi @Morganh ,
Thank you for your suggestions. I made the recommended changes to the spec file, but I’m still encountering the same issues. Kindly see my updates and follow-up questions below:

1. Invalid Detections Persist After Applying Suggested Config Changes

I’ve updated the preprocessing and evaluation_box_config sections as recommended:

preprocessing {
    output_image_width: 832
    output_image_height: 272
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
    enable_auto_resize: true
}

evaluation_box_config {
    key: "head"
    value {
        minimum_height: 4
        maximum_height: 9999
        minimum_width: 4
        maximum_width: 9999
    }
}

Despite these updates, I’m still seeing invalid or incorrect detections during inference — even with a very low confidence threshold (e.g., 0.0001).


2. Compatibility and Model Conversion Issues with TAO Toolkit 4.0.1

I tried training the same setup using tao-toolkit:4.0.1-tf1.15.5. Here’s what I observed:

  • I used a .hdf5 pretrained model as a starting point, since the training outputs are still in .tlt format.
  • After training, I exported the .tlt to .etlt using this command:
tao detectnet_v2 export \
  -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
  -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt \
  -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
  -k my_custom_key \
  --data_type fp16 \
  --batch_size 8 \
  --gen_ds_config

However, I was unable to generate the engine file from the .etlt inside the DeepStream pipeline.
Then I converted it to TRT engine using:

tao-deploy detectnet_v2 gen_trt_engine \
  -m $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
  -k my_custom_key \
  --data_type int8 \
  --batches 10 \
  --batch_size 8 \
  --max_batch_size 64 \
  --engine_file $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.trt.int8 \
  --cal_cache_file $USER_EXPERIMENT_DIR/experiment_dir_final/calibration.bin \
  -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti.txt \
  --verbose

Issue: When I run this .trt engine with DeepStream, it results in a “core dumped” error.
My DeepStream config looks like this:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
model-engine-file=resnet18_detector.trt.int8
labelfile-path=labels_face.txt
batch-size=1
network-mode=1
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
infer-dims=3;272;832

[class-attrs-all]
nms-iou-threshold=0.4
pre-cluster-threshold=0.2
topk=50

Questions:

  • Can we use an .hdf5 pretrained model directly in TAO 4.0.x for training and exporting? Or is this incompatible with the current DetectNetV2 export pipeline?
  • Is there any additional step required when converting from .tlt to .etlt — or any known issues that could prevent engine file generation from .etlt in DeepStream?
  • Could you suggest any debug steps for the DeepStream crash when loading the .trt engine?

Any further guidance on debugging this or ensuring proper compatibility between TAO versions and DeepStream would be much appreciated.

Thank you again for your support!

How about the training images average resolution, and are the objects small? Several images and their labels are helpful to understand.

No, for TAO 4.0.x, please use the .tlt version pretrained model from ngc.

Yes, it is needed to run exporting. You can run exporting inside the 4.0.x docker.

In previous version of deepstream_tao_apps, you can config the .etlt model in the config file. For example, https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/release/tao4.0_ds6.3gahttps://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/release/tao4.0_ds6.3ga/configs/nvinfer/peoplenet_tao/config_infer_primary_peoplenet.txt#L16-L17.