Hi,
I am new to using the CUDA profiler and the documentation is as minimal as it can be. I am trying to make sense of the instruction counter.
From the previous posts I came to know the following and wanted a confirmation:
-
The profiler data is for 1 multi-processor and is not representative of the entire processor.
-
Instruction count is for every block.
-
I would also like to know whether
instruction count = integer ops + floating point instructions ?? -
Is
instruction count = arithmetic ops
or
instruction count = arithemtic ops + branches + loads + stores + …
Thanks!!!