Hi all,
How to avoid implicit quantization in TensorRT to run pre quantized model.Is there any API for this in tensort5.1.6.
Hi all,
How to avoid implicit quantization in TensorRT to run pre quantized model.Is there any API for this in tensort5.1.6.
Hi,
Do you mean the quantization for different precision.
If this, you can use fp32 node to avoid all the quantization.
Thanks.
Hi,
My model is INT8 model, i want run it in INT8 mode.Is it possible?
Thanks
Hi,
You will need the quantization function to convert the data from FP32 into INT8.
Thanks.
Hi,
I have already quantized caffe model(INT8), can i read the model with caffe parser in 8 bit .And I want run the model in INT8 mode using tensorrt frame work.
Thanks
Hi,
It may not be supported. It’s recommended to give it a try first.
The main reason is that the quantization function may be different among TensorRT and Caffe.
Although you already have an INT8 model, you still need to convert the input data into INT8 precision.
So it’s important that the quantization function of your model and input data is identical or can be calibrated with the calibrated cache.
Thanks.
Hi Aastalll,
sorry for late reply
I got very bad results.
accuracy of floating model 72%
our quantization got 69%
On Jetson AGX Xavier running in int8 mode with entropy-2 calibration not even getting 30%.
Thanks.
Hi,
Let me put it this way, i have all the Q formats for each layer input, output and weights. Is it possible to mention Q formats defined by us, so that tensorrt uses those Q formats for quantizing the model ?
Thanks,
Kalyan.
Hello,
please reply the questions.
How to read quantized weights?
I didn’t see any change in outputs by changing calibration set , by giving dynamic range and by changing the values in calibration file every time getting same output values.
Thanks,
kalyan.
Hi,
Are you finding an API to set up the model weight on your own?
If yes, please check this page:
https://github.com/NVIDIA/TensorRT/blob/release/6.0/samples/opensource/sampleMNISTAPI/sampleMNISTAPI.cpp
It’s recommended to check if both input and weight of your model are in INT8 format first.
Thanks.