measure integer instructions by nvprof

Background: I am counting flops of my application on GPU. I assume CUDA core also performs integer operations. I know the metric to measure single precision FLOPS is flop_count_sp. What is the metric name for measuring the total number of integer arithematic instructions?

Also, does integer add have the same latency as SPFP add? Where can I find those latency information?

Thanks,
M.

The nvprof profiler metrics reference is here:

[url]Profiler :: CUDA Toolkit Documentation

The metric you are looking for may be inst_integer

Instruction latency is not published by NVIDIA anywhere, that I am aware of.

You can get an estimate of relative throughput of some instruction by looking at table 2 in the programming guide:

[url]Programming Guide :: CUDA Toolkit Documentation