bug in CUPTI - occupancy on Kepler is 2x off

When reading "achieved_occupancy" using CUPTI on Kepler, the numbers are 2x off.

For example, if running "callback_metric" CUPTI example on GTX680 (8 multiprocessors) with kernel launch parameters <<<8,256>>> I get "Metric achieved_occupancy = 0.247419". However, occupancy should be 256/2048 = 0.125. Same effect on K20c with 13 blocks.

Theoretic occupancy is calculated assuming that there are enough blocks available to “fill” all the SMs. It provides an upper bound on how many blocks of a kernel can execute simultaneously on each SM. You can find more information about theoretic occupancy in the CUDA Best Practices Guide at docs.nvidia.com.

Theoretic occupany is limited by threads per block, registers per thread, and shared-memory usage per block. Assuming that register and shared memory usage are low, your kernel would have a theoretical occupancy of 100% because 8 blocks could execute on each SM ((8 * 256) / 2056) = 1.0.

Achieved occupany is an actual measure of the occupancy achieved across all SMs on the GPU. It is always bounded by theoretic occupancy but can be lower when all SMs are not equally busy over the entire duration of the kernel.

If you take the kernel from the cupti_metric example and run it through nvprof or nvvp you can see both the theoretic and achieved occupancies (you can’t run the example directly because it calls the cupti API directly and nvprof and nvvp don’t support apps that call cupti directly). You can also try using the occupancy calculator that is included in the toolkit. It is a spreadsheet that lets you experiment with different kernel parameters.

I’m sorry, I didn’t see your reply - notifications used to come by email, but something happened and I don’t get them anymore.

In my example I have 8 blocks in total and there are only 8 SMs. Therefore, theoretically I should get 1 block per SM, and this expectation is confirmed by CUPTI - the example reports a nonzero number for every SM. (If each of those 8 numbers is a number for a different SM.) The theoretical occupancy is therefore 12.5% - but CUPTI reports 25%. Bug?

I also run the code through the visual profiler as you suggested - it shows 12.4% occupancy, same as the theoretical estimate. NVVP doesn’t seem to have this bug.