I use nvprof to profile my tensorRT FP16 model inference.

with flags such as --single_precision_fu_utilization, we can know the resources kernel use.

For example:

the following table tells us the kernel function cuInt8::nhwc8Tonchhw2 use fp32 Core but no fp16 Core.

Kernel: void cuInt8::nhwc8Tonchhw2<int=32, int=16, int=2>(__half const *, cuInt8::nhwc8Tonchhw2<int=32, int=16, int=2>*, int, int, int, int, int, int)

6 tensor_precision_fu_utilization Tensor-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)

6 tensor_int_fu_utilization Tensor-Int Function Unit Utilization Idle (0) Idle (0) Idle (0)

6 single_precision_fu_utilization Single-Precision Function Unit Utilization Low (2) Low (3) Low (2)

6 half_precision_fu_utilization Half-Precision Function Unit Utilization Idle (0) Idle (0) Idle (0)

My questions are:

1.

I’m doing FP16 inference, so there should not be any int8 operators. However, cuInt8::nhwc8Tonchhw2 appears, and it seems to be int8 function.

Are there some documentation, where I can find what the kernel function (such as cuInt8::nhwc8Tonchhw2) are doing?

Thanks in advance!