Description
I want to use setdynamicrange() interface to achieve qat quantization in tensorrt,so I set the default_qat config in pytorch and also got a qat version result .I multiply the scale parameter of each tensor by 127 and set it to tensor->setdynamicrange(), and then use the builder->setint8mode(true) and builder->setStrictTypeConstraints(true) functions to ensure that tensorrt uses int8 mode.But when I tested with a model containing only one conv layer, I found that this layer seemed to fall back to fp32
Environment
TensorRT Version: 5.1.3
GPU Type: gtx2060
Nvidia Driver Version:
CUDA Version: 10.0
CUDNN Version: 7.5
Operating System + Version:ubuntu 16.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4.0
Baremetal or Container (if container which image + tag):
Relevant Files
Steps To Reproduce
You can see from the first picture that I set the scale parameter correctly, and at the same time you can see from the second picture that this layer is running on higher pricision.I think if this layer is running on int8 mode, then it should output this line of information:
The result of this model is different from the result of pytorch qat, but it is the same as the result of fp32, which can also explain my point of view.
Is there any way I can avoid this fall back mechanism?