Occupancy factor

Wizarde · October 19, 2015, 10:50am

Hello.

I’m working on the occupancy of my kernel. I got a theoretical occupancy of 12.5%, but i think i’m missing one parameter. Indeed, i don’t use shared memory, the number of register is not so high.

What kinf of factor could impact so hard the occupancy?

NEW CUDA CODE
ptxas info    : Compiling entry function '_Z17mygetRSS_ITM_SRTMPdP7s_blockS_PvmP11s_host_infoP10s_devparamP13s_device_data' for 'sm_52'
ptxas info    : Function properties for _Z17mygetRSS_ITM_SRTMPdP7s_blockS_PvmP11s_host_infoP10s_devparamP13s_device_data
    8 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 64 registers, 384 bytes cmem[0], 892 bytes cmem[2]

=> launch advice <<< 5860, 256 >>>
Number of sm = 24. Max active blocks = 1. Theoretical occupancy: 0.125000

With my Titan-X i should get like a theorical occupancy of 50%. With the old version of my kernel, with 125 register per thread i was at 25%. I rewrote all the code and i did drop the occupancy although the number of register is far reduce.

The big difference is that i compile all my .cu into .o then i link them to get a library static. So, i use a .hcu with some prototype of __device__function. Before the new code, the source files were including each other to produce a huge big .cu

OLD CUDA CODA
ptxas info    : Compiling entry function '_Z15getRSS_ITM_SRTMPdS_S_dd8AreaTypeddiddiiS_P9prop_typeP10propv_typeP10propa_typeS_S_ddddiiiddiiiiiiS_ddiiS_Piddd' for 'sm_52'
ptxas info    : Function properties for _Z15getRSS_ITM_SRTMPdS_S_dd8AreaTypeddiddiiS_P9prop_typeP10propv_typeP10propa_typeS_S_ddddiiiddiiiiiiS_ddiiS_Piddd
    48 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 124 registers, 624 bytes cmem[0], 1632 bytes cmem[2]

=> launch advice <<< 2930, 512 >>>
Number of sm = 24. Max active blocks = 1. Theoretical occupancy: 0.250000

The old code does not perform better than the improve code, but the new code is really different. So i really care about this occupancy, i’d like hit like 40% to bench if their is gain of performance

little_jimmy · October 20, 2015, 5:39am

i take it you are not using texture

have you tried reducing the block size, just to note impact?

Topic		Replies	Views
Occupancy wierdness.... Is the calculator wrong? CUDA Programming and Performance	5	5901	July 25, 2007
Occupancy limiting factor = Block-Size Occupancy limit CUDA Programming and Performance	9	5583	June 5, 2011
Modifying ptc code and compiling it possible ? CUDA Programming and Performance	5	7439	July 12, 2007
Occupancy Calculation in check but still 'out of resource' error. CUDA Programming and Performance	4	3014	November 15, 2009
how to reduce the number of registers CUDA Programming and Performance	5	8902	July 8, 2010
understanding the trade-off between block size and occupancy CUDA Programming and Performance	1	14151	March 29, 2010
occupancy and performance also a question about .cubin files CUDA Programming and Performance	6	2207	December 9, 2009
help me understand `odd' performance CUDA Programming and Performance	5	1669	June 18, 2010
CUDA Visual Profiler Vista CUDA Programming and Performance	2	4131	September 11, 2009
Few performance questions occupancy,active threads,cta_launch CUDA Programming and Performance	4	4754	January 30, 2009

Occupancy factor

Related topics