Hi, I’ve been looking for a way to quantize my DNN model using my int8 weight and activation
quantization method and do inference with TensorRT.
I can train the model by quantization-aware training with some quantization method, and also can save the trained low-precision weights, but, in the inference phase, I have no clue how to apply the same activation quantization method as that of the training phase, instead of the TensorRT’s method…
So my question is that how can I implement a custom activation quantization method with TensorRT inference ?
Thank you.