Hi everyone,
Occupancy is defined in 《NVIDIA CUDA C Programming Best Practices Guide》like this: Occupancy is the ratio of the number of active warps per multiprocessor to the maximum number of possible active warps.
In my experiment, I define 55 threads blocks and each with size 256 * 1. I use “CUDA Visual Profile” to analyze the running of my program, and it comes out that:
grid size x:55
block size x: 256
sm sta launched: 2
occupancy:1
(“sm sta launched” stands for “Number of threads blocks launched on a multiprocessor”)
My device is Tesla C1060, of which the maximum number of active warps per multiprocessor is 32.
I’m really confused about the result of occupancy here. Since every MP has 2 blocks launched, the number of active warps per MP should be 2 * 256 / 32 = 16.
According to the definition of occupancy, it must be equal to 16/30 = 0.5. Then Why it’s 1 here? Does anyone could help me explain it? thx!