• Hardware:
Razer Blade 15 RTX 3080 Ti
• Network Type:
Detectnet_v2 (DashcamNet)
• TLT Version:
nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.4-py3
Hi, I am trying to retrain the DashCamNet with my custom dataset because I am seeing confidence ~ 0.5 and want to see if I can boost them up.
However, I am seeing a very low inference quality confidence ~0.02.
First of all, I am dealing with images with resolution of 1920x1080.
I am using ros_deep_learning to run the DashcamNet on the ROS environment.
I have a relatively small dataset ~ 500 images with a single car in the FOV. (I am not interested in other classes.)
The resolution is in 960x544 .jpg file.
image0000.txt (79 Bytes)
I am trying to see if this is working as intended and create more datasets.
Let me tell you what I have done.
-
I do dataset conversion
detectnet_v2 dataset_convert -d /tlt_ws/src/model/configs/kitti_config_960x544.txt -o /tlt_ws/src/model/converted_960x544/tf
kitti_config_960x544.txt (238 Bytes) -
I train the pretrained DashCamNet
detectnet_v2 train -e /tlt_ws/src/model/configs/training_config_960x544.txt -r ./demo -k tlt_encode --gpus 1
training_config_960x544.txt (5.4 KB)
(Unpruned DashCamNet Acquired: DashCamNet | NVIDIA NGC)
- I evaluate the .tlt files and pick the one with the highest average precision (which is model.step-7700.tlt in my case)
detectnet_v2 evaluate -e demo/experiment_spec.txt -m ./demo/model.step-7700.tlt -k tlt_encode
experiment_spec.txt (6.2 KB)
Validation cost: 0.000251
Mean average_precision (in %): 22.7273
class name average precision (in %)
bicycle 0
car 90.9091
person 0
road_sign 0
Median Inference Time: 0.011957
- I prune the acquired .tlt with checking average precision
detectnet_v2 prune -m demo/model.step-7700.tlt -o ./demo_pruned/pruned08.tlt -pth 0.08 -k tlt_encode
-pth 0.08 seems to be maintaining the average precision while the file size becomes 8.7MB from 46.5MB.
if I evaluate the pruned .tlt file again using
detectnet_v2 evaluate -e demo/experiment_spec.txt -m ./demo_pruned/pruned08.tlt -k tlt_encode
Validation cost: 0.000251
Mean average_precision (in %): 22.7273
class name average precision (in %)
bicycle 0
car 90.9091
person 0
road_sign 0
Median Inference Time: 0.006338
- I retrain the pruned .tlt
detectnet_v2 train -e ./configs/training_config_demo_960x544.txt -r ./demo_pruned -k tlt_encode --gpus 1
training_config_demo_960x544.txt (5.4 KB)
- I evaluate the retrained .tlt and pick the highest average precision .tlt file.
detectnet_v2 evaluate -e demo/experiment_spec.txt -m ./demo_pruned/model.step-9900.tlt -k tlt_encode
Validation cost: 0.000094
Mean average_precision (in %): 21.3554
class name average precision (in %)
bicycle 0
car 85.4216
person 0
road_sign 0
Median Inference Time: 0.012592
- I export the .tlt file to .etlt
detectnet_v2 export -m demo_pruned/model.step-9900.tlt -k tlt_encode -e configs/training_config_pruned_960x544.txt -o export_960x544/pruned_retrained_960x544.etlt
- I convert .etlt to .engine
./tao-converter -d 3,544,960 -k tlt_encode -o output_cov/Sigmoid,output_bbox/BiasAdd pruned_retrained_960x544.etlt
- Inference Result with 0.01 threshold
This is kind of confusing because if I convert the vanila DashCamNet in .engine file and launch with the roslaunch file. It works as intended.
However, If I switch to the custom trained model, it is not working as intended but still average precision is high enough when evaluating.
I believe I am clearly doing something wrong.
Suspicious things:
- I did not claim data_type int8 anywhere?
- I did not utilize calibration.tensor or .bin anywhere?
- The resolution is off?
If I were to inference 1920x1080 image, then I should be training in 1920x1088 instead of 960x544 with the dataset with resolution of1920x1080? - Too small dataset?
My ambition is too keep the original inference quality of DashcamNet while revising my specific environment?
But too small dataset can mess both up?
This might be too wordy but I tried…
Please feel free to let me know, if you think if anything random thing might be one of the factors.
Thanks for the attention!