Hi, my input data and weights are in the fp16 data format during the convolution inference process. Will fp32 occur during the convolution calculation process? How is it intercepted from fp32 to fp16?
As shown in the figure, I used the profile tool to see that the conv operation calls these kernel functions. Does this mean that the convolution operation of fp16 involves truncation from fp32 to fp16? How is it truncated? In which computing node does it occur? , is it accumulated and finally truncated to fp16?Looking forward to your answer,thank you!