CUDA Visual Profiler Vista

Tonas · June 22, 2009, 3:30pm

Hi guys!

I’ve some strange results from "Analyse Occupancy " of CUDA visual profiler. I’ve made a test where the kernel had 3 blocks and 1 thread and the result of “Analyse Occupancy” were this:

Occupancy analysis for kernel ‘kernelTest’ for device ‘Session52 : Device0’ :
Kernel details : Grid size: 3 x 1, Block size: 1 x 1 x 1
Register Ratio = 0.25 ( 2048 / 8192 ) [2 registers per thread]
Shared Memory Ratio = 0.25 ( 4096 / 16384 ) [20 bytes per Block]
Active Blocks per SM = 8 : 8
Active threads per SM = 8 : 768
Occupancy = 0.333333 ( 8 / 24 )
Occupancy limiting factor = Block-Size

it’s a little awkward because i’ve lauched the kernel with (3,1) configuration, like it says in kernel details…so, why it’s 8 active blocks? and 8 active threads? well, 8 threads it’s logic because 8 blocks with size 1…but why 8 and not 3 blocks? tks in advance!

mikekis · September 10, 2009, 6:56pm

I think it is senseless to profile program with number of blocks less than number of microprocessors because it measures all parameters for some single microprocessor and then multiplies them by number of microprocessors.

LSChien · September 11, 2009, 1:24am

Hi guys!

I’ve some strange results from "Analyse Occupancy " of CUDA visual profiler. I’ve made a test where the kernel had 3 blocks and 1 thread and the result of “Analyse Occupancy” were this:

Occupancy analysis for kernel ‘kernelTest’ for device ‘Session52 : Device0’ :

Kernel details : Grid size: 3 x 1, Block size: 1 x 1 x 1

Register Ratio = 0.25 ( 2048 / 8192 ) [2 registers per thread]

Shared Memory Ratio = 0.25 ( 4096 / 16384 ) [20 bytes per Block]

Active Blocks per SM = 8 : 8

Active threads per SM = 8 : 768

Occupancy = 0.333333 ( 8 / 24 )

Occupancy limiting factor = Block-Size

it’s a little awkward because i’ve lauched the kernel with (3,1) configuration, like it says in kernel details…so, why it’s 8 active blocks? and 8 active threads? well, 8 threads it’s logic because 8 blocks with size 1…but why 8 and not 3 blocks? tks in advance!

definition: Occupancy is the ratio of the number of active warps per multiprocessor to the maximum number of

possible active warps.

So to calculate occupancy, you don’t need grid information.

from CUDA_Occupancy_calculator, you only need to input

(1) threads per block

(2) registers per thread

(3) shared memory per block (bytes)

according to your setting, you have 8 active blocks per SM,

though each block only has 3 threads, however GPU use a warp as a unit in hardware, not a thread.

block has 3 threads → block has one warp (only thread 0, 1, 2 are active in a warp).

so you have 8 (blocks/SM) x 1 (warp/block) = 8 (warp/SM)

occupancy = 8 (warp/SM) / 24(maximum waprs / SM) = 8/24 = 0.333333