Hi there,
As stated here , I was able to calibrate and generate an int8 engine in the YOLO example. However, the performance(mAP) of the int8 model dropped about 7-15% compared with the fp32 model. Is this normal? How can I improve it?
My setup is the following:
Jetson Xavier
DeepStream 5.0
JetPack 4.4
TensorRT 7.1.3
NVIDIA GPU Driver Version 10.2
It’s possible to have some accuracy drop when inferencing with INT8.
The amount depends on the calibration and the model’s property.
But 7-15% seems too much.
Would you mind to share the original file (ex. onnx, pb or .caffe) with us?
As well as the data and source you used for generating the calibration cache?
Thanks for the swift response. Original model files I used were a darknet weight file and a cfg file. As for calibration, I firstly selected 200 random images from the training set as calib dataset. Then I used the entire training set as calib dataset. It seems that the latter option offered better performance.
I uploaded model files and a small subset of the training set.
In general, TensorRT will merge/combine several layers together for acceleration (ex. conv+scale+activation).
However, the layer are calibrated without merging in your cache file.
Not sure if this causes some unexpected accuracy drop.
Would you mind to try the calibration tool shared in the below GitHub again:
We have verified that the cache files in the GitHub can output the detections correctly.
I followed the demo#5 to create an onnx file, and followed demo#6 to calibrate and get a calibration cache. Then I used the cache in the deepstream yolo-app as int8-calib-file.
When I used tensorrt-demo generated caches within the repo, they all worked fine. When I moved the cache into deepstream, I got the following:
ERROR: [TRT]: Calibration failure occurred with no scaling factors detected. This could be due to no int8 calibrator or insufficient custom scales for network layers. Please see int8 sample to setup calibration correctly.
ERROR: [TRT]: Builder failed while configuring INT8 mode.
Building engine failed!
I also tried the yolov3-tiny cache as you suggested, and the same thing happened – it only works for the given repo and cannot be transferred to Deepstream. The error is the same as mentioned above.
Hi,
It did work. However the accuracy only increased for about 1.2 percent, which means the INT8 quantization still caused about 6 percent accuracy drop. Is there any other way to improve this?