Retraining Detectnet_v2, Dashcamnet, with Custom Dataset, Inference Quality Issue

• Hardware:
Razer Blade 15 RTX 3080 Ti

• Network Type:
Detectnet_v2 (DashcamNet)

• TLT Version:
nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3

Hi, I am trying to retrain the DashCamNet with my custom dataset because I am seeing confidence ~ 0.5 and want to see if I can boost them up.

However, I am seeing a very low inference quality confidence ~0.02.

First of all, I am dealing with images with resolution of 1920x1080.
I am using ros_deep_learning to run the DashcamNet on the ROS environment.

I have a relatively small dataset ~ 500 images with a single car in the FOV. (I am not interested in other classes.)
The resolution is in 960x544 .jpg file.


image0000.txt (79 Bytes)

I am trying to see if this is working as intended and create more datasets.

Let me tell you what I have done.

  1. I do dataset conversion
    detectnet_v2 dataset_convert -d /tlt_ws/src/model/configs/kitti_config_960x544.txt -o /tlt_ws/src/model/converted_960x544/tf
    kitti_config_960x544.txt (238 Bytes)

  2. I train the pretrained DashCamNet
    detectnet_v2 train -e /tlt_ws/src/model/configs/training_config_960x544.txt -r ./demo -k tlt_encode --gpus 1
    training_config_960x544.txt (5.4 KB)

(Unpruned DashCamNet Acquired: DashCamNet | NVIDIA NGC)

  1. I evaluate the .tlt files and pick the one with the highest average precision (which is model.step-7700.tlt in my case)
    detectnet_v2 evaluate -e demo/experiment_spec.txt -m ./demo/model.step-7700.tlt -k tlt_encode
    experiment_spec.txt (6.2 KB)

Validation cost: 0.000251
Mean average_precision (in %): 22.7273

class name average precision (in %)


bicycle 0
car 90.9091
person 0
road_sign 0

Median Inference Time: 0.011957

  1. I prune the acquired .tlt with checking average precision

detectnet_v2 prune -m demo/model.step-7700.tlt -o ./demo_pruned/pruned08.tlt -pth 0.08 -k tlt_encode

-pth 0.08 seems to be maintaining the average precision while the file size becomes 8.7MB from 46.5MB.

if I evaluate the pruned .tlt file again using
detectnet_v2 evaluate -e demo/experiment_spec.txt -m ./demo_pruned/pruned08.tlt -k tlt_encode

Validation cost: 0.000251
Mean average_precision (in %): 22.7273

class name average precision (in %)


bicycle 0
car 90.9091
person 0
road_sign 0

Median Inference Time: 0.006338

  1. I retrain the pruned .tlt

detectnet_v2 train -e ./configs/training_config_demo_960x544.txt -r ./demo_pruned -k tlt_encode --gpus 1

training_config_demo_960x544.txt (5.4 KB)

  1. I evaluate the retrained .tlt and pick the highest average precision .tlt file.
    detectnet_v2 evaluate -e demo/experiment_spec.txt -m ./demo_pruned/model.step-9900.tlt -k tlt_encode
    Validation cost: 0.000094
    Mean average_precision (in %): 21.3554

class name average precision (in %)


bicycle 0
car 85.4216
person 0
road_sign 0

Median Inference Time: 0.012592

  1. I export the .tlt file to .etlt

detectnet_v2 export -m demo_pruned/model.step-9900.tlt -k tlt_encode -e configs/training_config_pruned_960x544.txt -o export_960x544/pruned_retrained_960x544.etlt

  1. I convert .etlt to .engine

./tao-converter -d 3,544,960 -k tlt_encode -o output_cov/Sigmoid,output_bbox/BiasAdd pruned_retrained_960x544.etlt

  1. Inference Result with 0.01 threshold

This is kind of confusing because if I convert the vanila DashCamNet in .engine file and launch with the roslaunch file. It works as intended.

However, If I switch to the custom trained model, it is not working as intended but still average precision is high enough when evaluating.

I believe I am clearly doing something wrong.
Suspicious things:

  1. I did not claim data_type int8 anywhere?
  2. I did not utilize calibration.tensor or .bin anywhere?
  3. The resolution is off?
    If I were to inference 1920x1080 image, then I should be training in 1920x1088 instead of 960x544 with the dataset with resolution of1920x1080?
  4. Too small dataset?
    My ambition is too keep the original inference quality of DashcamNet while revising my specific environment?
    But too small dataset can mess both up?

This might be too wordy but I tried…
Please feel free to let me know, if you think if anything random thing might be one of the factors.
Thanks for the attention!

Please check if the custom trained model (.tlt) works as intended.
If yes, then please check if fp32 tensorrt engine or fp16 tensorrt engine works as intended.

Thanks for the reply!

  1. Please check if the custom trained model (.tlt) works as intended.
  • How do I check the custom.tlt is working as intended?
    I have done (detectnet_v2 evaluate) thing to check average precision, and they are showing okay precision, 0.8~0.9 on car.
  • Or you are saying trying other means to check the custom.tlt?
  • I am not sure how else to check .tlt works as intended other than that. Can you please let me know? (or reference?)
  1. If yes, then please check if fp32 tensorrt engine or fp16 tensorrt engine works as intended.
  • It looks like I have been dealing with fp32 until now because I was converting without ‘-t int8’ and fp32 is the default value apparently.

  • Now, I believe I am converting the model into int8, fp16, fp32 properly.
    ./tao-converter -k tlt_encode -d 3,544,960 -o output_cov/Sigmoid,output_bbox/BiasAdd pruned_retrained_960x544.etlt -t int8 -c dashcamnet_int8.txt
    dashcamnet_int8.txt (4.0 KB)

  • I tried using int8, fp16, fp32 .engine file.
    int8, fp16 and fp32 are showing pretty much the same result.

Please run detectnet_v2 inference . You can refer to the notebook or user guide.

Also, from your inference result of tensorrt engine, the car is well detected. May I know that what is the expected?

  1. Please run detectnet_v2 inference . You can refer to the notebook or user guide.

Thanks for letting me know.
I tried inferencing training dataset with confidence 0.5.
I would say still a very low quality inference result, even with the detectnet_v2 inference command similar to running .engine with ROS.

Only 20% of dataset was able to be boxed at around confidence 0.5.


image0000.txt (89 Bytes)

The other 80 % was missing bounding box. (I believe they are at about ~ 0.1 confidence.)

inferencer_config.txt (1.6 KB)

  1. Also, from your inference result of tensorrt engine, the car is well detected. May I know that what is the expected?

Sorry that I did not make it clear.
As you can see it on the gif below, the bbox is blinking with the low confidence (under 0.1 confidence).
This is dramatically worse than the plain pretrained DashcamNet.

custom trained dashcamnet:
low_int8-2023-10-12_17.45.17

plain dashcamnet:
low_int8-2023-10-12_18.03.14

So my custom training completely ruined the Pretrained DashcamNet?
If so, it is because of my training config? small dataset?
I am not sure why…

Thanks for the help.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Seems that the plain dashcamnet already matches your requirement. For dashcamnet retraining, yes, please add more dataset from expected scenario. You can train the car class only. After training, you can run detectnet_v2 inference to check the inference result.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.