Retrain the DetectNet_v2 model on custon data

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc): NVIDIA GeForce RTX 4070
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): Detectnet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I need to train the DetectNet_v2 model in a custom data. To start, follow the initial issues:

  1. Object Detection – KITTI Format:
    Data Annotation Format - NVIDIA Docs
    For DetectNet_v2, the train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly. Online resizing is supported for other detection model architectures:

Q1. Is there a specific resolution for the images. I mean, do I need to resize the images to a specific size before annotation (label)?

  1. Label Files:
    For detection the Toolkit only requires the class name and bbox coordinates fields to be populated. This is because the TAO training pipe supports training only for class and bbox coordinates. The remaining fields may be set to 0. Here is a sample file for a custom annotated dataset:
    car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Q2. The following meaning are ok?
(X_min, Y_min) = (587.01 173.33)
(X_max, X_max) = (614.12 200.12)

Q3. For DetectNet_v2, the train tool does support training with rotated rectangle annotation?
[image]
Thank you in advance for guide me in this task!


Reganding question 3 (Q3).

Please refer to DetectNet_v2 - NVIDIA Docs instead.

The train tool does not support training on images of multiple resolutions. However, the dataloader does support resizing images to the input resolution defined in the specification file. This can be enabled by setting the enable_auto_resize parameter to true in the augmentation_config module of the spec file.

You need not resize the images offline now. You can set the target image_height and image_width in the spec and set enable_auto_resize to true.

Yes. Refer to Data Annotation Format - NVIDIA Docs.

No, it is not supported.

@Morganh, thank for reply.
I have managed to train the detecnet in my custom data. And I have a question regarding the best model:
Q1. How to setup the wandb so that the X-axis be epochs intead of steps?

Q2. The figure above is the result for the detectnet_v2_retrain_resnet18_kitti. The best model is not the last one. How can I be sure that I exported the best model(mAP 95%), instead the last one(mAP 93%)? Because there is only one file in the $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights folder.

Thank you in advanced!

Currently not. But you can change the steps to epochs based on the formula. Refer to Epoch and checkpoint number association formula for Detectnet_V2 not lining up - #11 by Morganh.

You can check the log for the mAP result of each model. The “only one” should be the best one. And you can run evaluation against the “only one” with tao model evaluate xxx` to confirm.

Is it posible to set the Wandb to show the train loss and validation loss to inspect if the model is overfitting? I just can see the train loss:

I would like some thing like this:

To log validation loss, you might need to implement custom logging within the model’s evaluation loop. TAO’s built-in logging primarily focuses on training metrics, so additional code may be required to capture and log validation metrics to WandB.

import wandb

# Assuming 'val_loss' is your validation loss metric
wandb.log({"Validation Loss": val_loss})

Is it possible to implement (by myself) this additional code to capture and log validation metrics to WandB inside TAO’s built-in logging (inside the conteiner)?

Yes, it is possible. End user can modify any code inside the TAO container. You can login the container, and modify. The code is available under /usr/xxx.