question about calculating occupancy

Hi everyone,

Occupancy is defined in 《NVIDIA CUDA C Programming Best Practices Guide》like this: Occupancy is the ratio of the number of active warps per multiprocessor to the maximum number of possible active warps.
In my experiment, I define 55 threads blocks and each with size 256 * 1. I use “CUDA Visual Profile” to analyze the running of my program, and it comes out that:

grid size x:55
block size x: 256
sm sta launched: 2
(“sm sta launched” stands for “Number of threads blocks launched on a multiprocessor”)

My device is Tesla C1060, of which the maximum number of active warps per multiprocessor is 32.
I’m really confused about the result of occupancy here. Since every MP has 2 blocks launched, the number of active warps per MP should be 2 * 256 / 32 = 16.
According to the definition of occupancy, it must be equal to 16/30 = 0.5. Then Why it’s 1 here? Does anyone could help me explain it? thx!

Not sure about your occupancy number… however, this might help…
Profiling is done on only one of the MP’s, to be specific, it’ll be MP0. So, all numbers you see at the end of profiling is from only one MP. (Even though the kernel might have got executed on all the MP’s on your GPU)
Ref: /cudaprof/doc/cudaprof.html

Have you used the Occupancy Calculator ?

Otherwise, we’d need to know more about your kernel (eg. register usage)