Background: I am counting flops of my application on GPU. I assume CUDA core also performs integer operations. I know the metric to measure single precision FLOPS is flop_count_sp. What is the metric name for measuring the total number of integer arithematic instructions?
Also, does integer add have the same latency as SPFP add? Where can I find those latency information?
The nvprof profiler metrics reference is here:
The metric you are looking for may be inst_integer
Instruction latency is not published by NVIDIA anywhere, that I am aware of.
You can get an estimate of relative throughput of some instruction by looking at table 2 in the programming guide: