deepstream yolo-app fp16

Hi, I want to know how are the weights quantized to fp16 and int8 in deepstream’s yolo-app, is it only do on tensorRT(s7310-8-bit-inference-with-tensorrt.pdf)?(or do by other methods)
I also have a question, does the quantize method belong to post quantization?
Sorry my English is poor.



The quantization follows the IEEE 754 floating point standard (2008), which defines half-precision numbers as follows:

  • Sign: 1 bit
  • Exponent width: 5 bits
  • Significand precision: 11 bits (10 explicitly stored)

You can find the float2half and half2float function in cuDNN v7.5.


Hi, I find that it has a sample to inference yolov3 with int8 weights. If I change the model of yolov3(add maxpooling layer or delete some layers), how can I inference with my own model(already trained with darknet) in in8?

Hope to get your help, thanks.


First, you will need a INT8 calibration cache data for your customized model.
A calibration sample can be found in /usr/src/tensorrt/samples/sampleINT8.

Then, update the config file like this:

## 0=FP32, 1=INT8, 2=FP16 mode


Hi AastaLLL,

Thanks for your help.
Would you be able guide me on how to do this? Is there a sample of creating a calibration table for Yolov3(maybe add some layers) in Xavier? The link you provided is and samples on MNIST but not sure how to work on YOLOv3(maybe add some layers) for Xavier. If there is a sample using the deepstream package would also be great.
I have already run my model(trained on darknet) on Xavier in fp16 mode.
I find that you give a link in, but some links can not be found(404 page not found).
And I still don’t know how to get a INT8 calibration cache data for my model.
Can guide me on how to do this? Thanks for your help.


You will need to implement the IInt8EntropyCalibrator2 to calibrate a customized model.
Please check our /usr/src/tensorrt/samples/sampleINT8 for the information.