How to using DLA gracefully with Int8 in TRT8

oPlusss · March 2, 2022, 3:47am

Hi

After I read the topic " Explicit vs Implicit Quantization", I think explicitly quantized is better than implicitly quantized. But I also found a description that says " DLA does not support Explicit Quantization" in the doc of TRT8. Does it mean int8 inference acceleration with DLA is only possible in implicitly quantized?

NVES · March 2, 2022, 4:07am

Hi,
Please check the below links, as they might answer your concerns.

Thanks!

oPlusss · March 4, 2022, 6:25am

Hi,
In implicit precision, only single dynamic range can be set in ITensor. Does it mean that TRT can’t use DLA with per channel quantization? And can I simulate the per channel quantization by IScaleLayer ?

oPlusss · March 4, 2022, 6:32am

@spolisetty please help me

spolisetty · March 23, 2022, 5:34am

Hi,

If using calibration, TensorRT only supports PTQ i.e. per-tensor quantization i.e. single scale activation and per-channel scale for weights. For operations such as conv, deconv, and fc, TRT computes per-channel kernel scales using a single scale from input activation, per-channel scale from weight, and a single scale from output activation.
If using QDQ ops, TRT does support both PTQ and PCQ (per-channel quantization).

Also, DLA does not support QDQ ops yet.

Thank you.

slimwangyue · December 20, 2023, 8:00pm

Is there a plan to support explicit quantization, AKA QDQ, shortly? My model needs to do quantization aware training (QDQ) to maintain the accuracy.