Background: I am counting flops of my application on GPU. I assume CUDA core also performs integer operations. I know the metric to measure single precision FLOPS is flop_count_sp. What is the metric name for measuring the total number of integer arithematic instructions?
Also, does integer add have the same latency as SPFP add? Where can I find those latency information?