In trt infer, the input fp32 and the weight is fp16 mode. So in matrix calculation, is the input converted to fp16? If so, the input is modified, the accuracy will definitely be different.
If FP16 is not enable, all calculation will be in FP32. But if you enable FP16, then TRT will choose the type based on which path is faster.
Hi，AakankshaS, thanks for reply. And what does “path” contains?
We will choose the combination of datatype/format for each tensor and kernel for each layer. And the combination has best perf for this network.
So the path basically refers to the same.