What's the default quantization mode for TensorRT PTQ

According to TensorRT’s document, TensorRT only supports symmetric and uniform type quantization, which means quantization zero-point should always be 0.

But when I set the dynamic range(e.g. (0, 5.6845)) for network layers manually, I find TensorRT calculates a scale and a non-zero zero-point through the verbose logs. So does TensorRT support non-symmetric uniform type quantization which is in conflict with the document?

And are the weights quantized per channel by default in PTQ? Can the user configure it to be per tensor?

@401616764,

We only support symmetric Quant. In PTQ TRT sets the weights’ scales, so user cannot control weights quant.

Thank you.