I am new to using the CUDA profiler and the documentation is as minimal as it can be. I am trying to make sense of the instruction counter.
From the previous posts I came to know the following and wanted a confirmation:
The profiler data is for 1 multi-processor and is not representative of the entire processor.
Instruction count is for every block.
I would also like to know whether
instruction count = integer ops + floating point instructions ??
instruction count = arithmetic ops
instruction count = arithemtic ops + branches + loads + stores + …