TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP)

Tao-converter is just the renaming of tlt-converter.
For 0 mAP of int8 model, it mostly results from cal.bin. How many images did you use to generate cal.bin?

thanks for getting back quickly.

we used 2784 images to generate cal.bin
(batches=348 * batch_size=8)

we use 27858 training images and we have used nearly 10% of training data (randomly sampled) for calibration.

Could you please try to add more images?

We tried using 30% of training data for calibration (8352 images), but we got similar outcomes with YOLOV4(CSPDarknet53) on both GPUs we tested.

  • Quadro RTX4000 - INT8 precision : PASCAL10 mAP@0.5 7.86%

  • GTX 1060 - INT8 precision : PASCAL10 mAP@0.5 0%

Based on your previous suggestion, I tried using the default converter that comes with tao docker instead of stand-alone tao-conveter, but got the same results (0 mAP)

>> tao info::
Configuration of the TAO Toolkit Instance
dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm']
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021

However we could get good results with INT8 precision when we performed QAT enabled training. For both GPUs we got mAP of around 82% on our test set in INT8 precision. Based on the documentation and our previous TensorRT implementations outside TAO, QAT is optional and we should still get reasonable performance with by performing calibration. So we are wondering whats the cause for this.

As mentioned before, we got good results with YOLOV4(resnet18) backbone in INT8 precision, with even 10% of calibration data. Also YOLOV4(CSPDarknet53) works fine in other modes (FP16/ FP32).
What do you think is the cause for this issue in INT8 of YOLOv4 with CSPDarknet53 backbone? Would it be beneficial to report this an issue?

According to latest comment , so for YOLOv4(CSPDarknet53)

  1. If you trained a model with QAT enabled, the mAP is around 82% . You get this value while running tlt-evaluate against the trt int8 engine , right?
  2. If you trained a model without QAT enabled, the mAP is 0 ?

Hi @Morganh , thanks for getting back to me.

for YOLOv4(CSPDarknet53)

If you trained a model with QAT enabled, the mAP is around 82% .
If you trained a model without QAT enabled, the mAP is 0 ?

Yes, but note that I only have this problem for tensorRT INT8 precision. In FP32/ FP16, both gets around 82%-84% accuracy.

 You get this value while running tlt-evaluate against the trt int8 engine , right?

No. I am using a python script to load and run model, and do pre/post processing. I have verified I get the same results from scripts as tao evaluate with .tlt model, but have not tested with tao evaluate + INT8 .engine file.
Also used the following reference for pre/post processing:

@Morganh I ran tests with tao evaluate + .engine for each engine:
I still got 0 mAP for the model trained without QAT, in INT8 precision, but FP32 engine converted from the same model achieved 83% mAP.

And model trained with QAT, in INT8 precision achieved 81.2% mAP with tao evaluate + .engine

So, the culprit may result in training without QAT .

So, the culprit may result in training without QAT

  • As mentioned before, we got good results with YOLOV4(resnet18) backbone in INT8 precision, with even 10% of calibration data (without QAT).
  • Based on the documentation and our previous TensorRT implementations outside TAO (without QAT), QAT is optional and we should still get reasonable performance with by performing calibration.

Based on our experience this is specific to YOLOv4 with CSPDarknet53 backbone in INT8? Would it be beneficial to report this an issue?

Could you help add “--force_ptq” flag when you export the tlt model, and then retry?

--force_ptq : Flag to force post training quantization for QAT models.

More, I will try to reproduce your result with KITTI public dataset.

1 Like

Could you help add “ –force_ptq ” flag when you export the tlt model, and then retry?
could you pleas explain how this may help? and how do we find for which models we need to apply vs not? thanks

It is for DLA specific case.
See DetectNet_v2 — TAO Toolkit 3.22.05 documentation
However, the current version of QAT doesnt natively support DLA int8 deployment on Jetson. To deploy this model on Jetson with DLA int8 , use the --force_ptq flag to use TensorRT post-training quantization to generate the calibration cache file.

And https://developer.nvidia.com/blog/improving-int8-accuracy-using-quantization-aware-training-and-tao-toolkit/
To deploy this model with the DLA, you must generate the calibration cache file using PTQ on the QAT-trained .tlt model file. You can do this by setting the force_ptq flag over the command line when running export .

we are not using DLA. And also we are having an issue with training without enabling QAT (not QAT enabled). so as I undertand, we dont need to use –force_ptq. Is that correct? Please let me know. thanks alot for the quick response.

Yes, you can ignore “force_ptq”.

1 Like

Hi,
I cannot reproduce 0 mAP against trt int8 engine. You can try with my step.
My step:

  • Run a training with cspdarknet19 backbone(I forget to set to 53, I will try later) with KITTI dataset.
    Only run for 10 epochs. Then get the tlt model.
  • Generate etlt model and trt int8 engine

yolo_v4 export -k nvidia_tlt -m epoch_010.tlt -e spec.txt --engine_file 384_1248.engine --data_type int8 --batch_size 8 --batches 10 --cal_cache_file export/cal.bin --cal_data_file export/cal.tensorfile --cal_image_dir /kitti_path/training/image_2 -o 384_1248.etlt

  • Run evaluation

yolo_v4 evaluate -e spec.txt -m 384_1248.engine

Try with cspdarknet53 backbone, there is also no issue.

thanks alot.

Sure. will try and let you know.

can you please let me know what is the mAP you got with the test set?

About 60%, I just test only 10 epochs for public KITTI dataset.

In this setup, you are using the .engine file generated while running yolo_v4 export which is specific to the machine that run the training and export.

I want to use the .etlt file (384_1248.etlt in above experiment) in another machine and convert it to a .engine file uing tao-converter and then use it for inference. That is where I am facing an issue.