After profiling my kernel in Nsight Compute, the Compute Workload Analysis says I that about 7% of Pipe Utilization are FP64 instructions. I made sure my code doesn’t use any doubles, all numerical constants have ‘f’ to avoid implicit conversion, etc. Then I looked carefully through the SASS code and I don’t see any DADD, DMUL, DFMA instructions or anything else that could look like FP64 instructions.
Is there a way to find out which instructions use the FP64 pipeline? Is the FP64 utilization something I can reduce further and how?
The code is compiled with Cuda Toolkit 12.6 and ran with RTX 2000 Ada GPU.
Compute Workload Analysis says I that about 7% of Pipe Utilization are FP64 instructions
To clarify, it’s not saying that 7% of the overall GPU’s pipe utilization are from fp64, but rather that the fp64 pipeline is 7% utilized. Basically, the sum of all utilizations in this chart don’t need to sum up to 100%.
Another instruction that can be executed by this pipeline would be e.g. DMMA.
Can you share the screenshots of both charts in the Compute Workload Analysis section, as well as the Instruction Statistics chart here?
OK, I see. It sounds like there is nothing I can do about reducing it further. I transfer data to GPU as uint8 and get results as int16 which saves a lot of data transfer. But this means I need to cast to float32 and back to int16 on GPU. I didn’t expect this conversion to utilize the FP64 pipe, but good to know.
Thanks for the support.