I’ve some strange results from "Analyse Occupancy " of CUDA visual profiler. I’ve made a test where the kernel had 3 blocks and 1 thread and the result of “Analyse Occupancy” were this:
Occupancy analysis for kernel ‘kernelTest’ for device ‘Session52 : Device0’ :
Kernel details : Grid size: 3 x 1, Block size: 1 x 1 x 1
Register Ratio = 0.25 ( 2048 / 8192 ) [2 registers per thread]
Shared Memory Ratio = 0.25 ( 4096 / 16384 ) [20 bytes per Block]
Active Blocks per SM = 8 : 8
Active threads per SM = 8 : 768
Occupancy = 0.333333 ( 8 / 24 )
Occupancy limiting factor = Block-Size
it’s a little awkward because i’ve lauched the kernel with (3,1) configuration, like it says in kernel details…so, why it’s 8 active blocks? and 8 active threads? well, 8 threads it’s logic because 8 blocks with size 1…but why 8 and not 3 blocks? tks in advance!
I think it is senseless to profile program with number of blocks less than number of microprocessors because it measures all parameters for some single microprocessor and then multiplies them by number of microprocessors.