nvprof -- cf_executed and inst_control

I ran the profiler against my application and it came back with the following metrics:

cf_issued    Issued Control-Flow Instructions   8,388,480
cf_executed  Executed Control-Flow Instructions 8,388,480 
inst_control Control-Flow Instructions          134,215,680

inst_control is exactly 16x cf_issued and cf_executed. inst_control is exactly 2x the number of threads I had running. There is obviously some kind of relationship between these numbers, but the docs don’t really help (or at least I haven’t found the explanation yet).

Can anyone tell me what, specifically, these metrics are counting and how they arrive at the values?

edit: That is to say, I know they are counting control-flow instructions, etc., but how they came to the values displayed.

edit 2: I think it’s something like the GPU issued 4 control-flow instructions per warp (67,107,840 threads), which would give the 8,388,480 number. But then, why would the total (I’m assuming it’s the total) inst_control only be 2x the number of threads? Wouldn’t it, at that point, be 4x?

cross post:


@Robert_Crovella apparently, the link that you posted is not working anymore. Is there another place that I can find a better explanation of the metrics?
The definition of those metrics aren’t clear for me:
cf_executed: Number of executed control-flow instructions
inst_control: Number of control-flow instructions executed by non-predicated threads (jump, branch, etc.)

There is also another post referring to this question