Understanding difference between instructions issued 1 and instructions issued 2 in computeprof (CUD

Hi everyone,

I’m trying to understand the difference between some of the stats categories in computeprof, specifically between instructions issued 1 and instructions issued 2. Unfortunately, I can’t find anything definitive on this in the documentation (searching for them on Google isn’t turning anything good up either). What’s the difference between the 2 of them?

Any help that can point me in the right direction on this would be greatly appreciated.

Thanks,
Matt

The end of my subject was supposed to say “CUDA 5.0” … sorry about that.

Matt

I guess one is for single-issue, another is for double issue. Here is a relevant cut-and-paste from CUPTI:

Event# 26
Id = 17
Name = inst_issued1_0
Shortdesc = instructions issued1_0
Longdesc = Number of single instruction issued per cycle in pipeline 0.
Category = CUPTI_EVENT_CATEGORY_INSTRUCTION

Event# 27
Id = 18
Name = inst_issued2_0
Shortdesc = instructions issued2_0
Longdesc = Number of dual instructions issued per cycle in pipeline 0.
Category = CUPTI_EVENT_CATEGORY_INSTRUCTION

Thanks for the response! With the dual issue, that represents issuing instructions from 2 different warps, right?

Also, is there a link to the CUPTI version you’re looking at? I tried pulling it up and searching through it for this, but never found anything like what you pasted.

Thanks,
Matt

inst_issued1 and inst_issue2 counters are per SM warp scheduler; however, the profiler reports the value for the full SM.

inst_issued2 is the number of times a warp scheduler issued a pair of instructions for a warp.
inst_issued1 is the number of times a warp scheduler issued a single instruction for a warp.

Warp schedulers in compute capability 2.1 - 3.5 devices can dual-issue instructions for a single warp per cycle.

General Relationships
inst_executed <= inst_issued
inst_issued == inst_issued2 + inst_issued1

For almost all applications
inst_issued1 > inst_issued2

CUDA Profiler Documentation

Sounds great, thanks Greg! I’d found the second 2 references you linked, but both of those didn’t really explain the inst_issued1/2 differentiation. The nvprof query and your explanation helped fill the gap there.

Matt

I wonder what can limit the number of dual-issued instructions. I’m trying to optimize a (long) sequence of FFMA instructions and I can only get to inst_issued2 ~ inst_issued1 (necessary for peak F32 throughput) if two of the input registers are the same. Which is not quite as useful as the general case.

Is Kepler register-bandwidth starved?