My pytorch and onnx model has an uint8 to fp32 cast layer which divides by 255. This cast layer is applied to the input tensor. When i convert the onnx model to tensorrt INT8 i get the following warning:
“Missing scale and zero-point for tensor input, expect fall back to non-int8 implementation for any layer consuming or producing given tensor”
For INT8 should i remove the cast layer before exporting the onnx model or does tensorrt deal with it itself? What is the recommended approach for best INT8 performance?