How to using DLA gracefully with Int8 in TRT8

Hi

After I read the topic " Explicit vs Implicit Quantization", I think explicitly quantized is better than implicitly quantized. But I also found a description that says " DLA does not support Explicit Quantization" in the doc of TRT8. Does it mean int8 inference acceleration with DLA is only possible in implicitly quantized?

Hi,
Please check the below links, as they might answer your concerns.
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_topic
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#dla_layers
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/#restrictions-with-dla
Thanks!

Hi,
In implicit precision, only single dynamic range can be set in ITensor. Does it mean that TRT can’t use DLA with per channel quantization? And can I simulate the per channel quantization by IScaleLayer ?

@spolisetty please help me

Hi,

If using calibration, TensorRT only supports PTQ i.e. per-tensor quantization i.e. single scale activation and per-channel scale for weights. For operations such as conv, deconv, and fc, TRT computes per-channel kernel scales using a single scale from input activation, per-channel scale from weight, and a single scale from output activation.
If using QDQ ops, TRT does support both PTQ and PCQ (per-channel quantization).

Also, DLA does not support QDQ ops yet.

Thank you.

1 Like