How TRT fp16 weight Multiply with fp32 input？

zanbf1123 · December 16, 2020, 2:50am

In trt infer, the input fp32 and the weight is fp16 mode. So in matrix calculation, is the input converted to fp16? If so, the input is modified, the accuracy will definitely be different.

AakankshaS · December 16, 2020, 7:15am

Hi @zanbf1123,
If FP16 is not enable, all calculation will be in FP32. But if you enable FP16, then TRT will choose the type based on which path is faster.

Thanks!

zanbf1123 · December 16, 2020, 8:09am

Hi，AakankshaS, thanks for reply. And what does “path” contains?

AakankshaS · December 16, 2020, 8:25am

Hi @zanbf1123,
We will choose the combination of datatype/format for each tensor and kernel for each layer. And the combination has best perf for this network.
So the path basically refers to the same.

Thanks!