Hi everyone,

Occupancy is defined in ã€ŠNVIDIA CUDA C Programming Best Practices Guideã€‹like this: Occupancy is the ratio of the number of active warps per multiprocessor to the maximum number of possible active warps.

In my experiment, I define 55 threads blocks and each with size 256 * 1. I use “CUDA Visual Profile” to analyze the running of my program, and it comes out that:

grid size x:55

block size x: 256

sm sta launched: 2

occupancy:1

(“sm sta launched” stands for “Number of threads blocks launched on a multiprocessor”)

My device is Tesla C1060, of which the maximum number of active warps per multiprocessor is 32.

I’m really confused about the result of occupancy here. Since every MP has 2 blocks launched, the number of active warps per MP should be 2 * 256 / 32 = 16.

According to the definition of occupancy, it must be equal to 16/30 = 0.5. Then Why it’s 1 here? Does anyone could help me explain it? thx!