Issues with training DetectNetv2

• Hardware NVIDIA A5000x8
• Network Type (Detectnet_v2)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) format_version: 2.0 / toolkit_version: 3.22.05 / v3.21.11-py3
detectnet_v2_train_resnet18_kitti.txt (7.0 KB)

I’m trying to train a custom detection model using resnet18 for 4 classes (cars, pickup, truck, tir (another truck type). My issue is that the model seems to not be working properly, I have multiple detections for the same object.
I tried to train with a batch size = 1, but the mAP will be 15-20%.

Is there anything that I’m doing wrong?

Are your training images all the same resolution?
If not, please set enable_auto_size: true in the spec file.

I used a script to resize them. They are all 1280x720 and the labels are correct. Dataset size:
b’car’: 26098
b’truck’: 4378
b’tir’: 2842
b’pickup’: 765
b’truck2’: 2662
b’van’: 713

I had some warnings but I removed all the objects that had issues. It works fine with 4 batch sizes per GPU, and min_learning_rate: 5e-06, max_learning_rate: 5e-04, but I don’t like the results, even if the mAP is good.

Below are the results for the first train, and after retraining it became better, but I have multiple detections for the same class. For example, I got two detections on the truck with a car label and a truck label when I use: !tao detectnet_v2 inference. This is the case for multiple images that I used for inference. I checked all the images, they are correctly labeled. Any idea why or if there is anything that I can try?

Validation cost: 0.000286
Mean average_precision (in %): 74.7897

class name average precision (in %)

car 78.9247
pickup 57.8026
tir 89.1348
truck 73.2968

Can you share the inference spec file? If possible, can you share an example to help understand the inference result better?

infer_config.txt (2.9 KB)

Below are the results and also the inference config. I used random images for inference to see the results.

I manually checked the dataset and there isn’t any truck labeled as a car - are the results normal in this case?

Could you try to run inference against some of the training images?

