measure integer instructions by nvprof

Background: I am counting flops of my application on GPU. I assume CUDA core also performs integer operations. I know the metric to measure single precision FLOPS is flop_count_sp. What is the metric name for measuring the total number of integer arithematic instructions?

Also, does integer add have the same latency as SPFP add? Where can I find those latency information?

Thanks,
M.

The nvprof profiler metrics reference is here:

https://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference

The metric you are looking for may be inst_integer

Instruction latency is not published by NVIDIA anywhere, that I am aware of.

You can get an estimate of relative throughput of some instruction by looking at table 2 in the programming guide:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions