Is all layer quantized to int8?

We are recently studying in quantization, and we’ve found many kinds of implementations. However most of them does not really quantize every layer, i.e., there are still some layers that use float operations only.
What we are curious about is, what do you mean to “use calibration to produce a quantized model”? Do you produce a model that all weights are int8 along with some “scales”?
Thank you!!

I am also doing quantization, and I set some CONVOLUTION layer to fp32 , but only I set most of them to fp32 , can I get the same score to the origin model, in this case the speed is same as fp32 model.
I am looking for the answer!

hello!I am also studying quantization.
tensorrt-PPT details are not perfect enough.
NCNN opensource shows that Conv Layer and FC layer need to be quantified.
conv‘s weight scale can be calculated at run time we dont need to record it. (every channel have a scale?)
See calibration table tensorrt Generated:reshape layer,softmax,plugin layer…, have its own scale.I’m curious about those layer’s mechanism and the meaning of 8-bit hexadecimal digits.
I see another implementation as well,The weight is also quantified(need to record)
now,google-tensorflow have it’s own int8 tool in tensorflowLite.and it need’s retrain(worth studying).