Occupancy doesn't tally with calculator

Tigga · January 17, 2009, 2:16am

First off - I’m using version 2.0 on a Tesla C870 (CC 1.0).

I have a kernel using up 47 registers. It’s currently suffering from low occupancy and I’m fairly sure increasing the occupancy will make it zoom a bit. Shared memory is not a problem - only 568 bytes per block.

First job was to stick in -maxrregcount = 42. This cuts the reg count without dipping into local memory… probably at the cost of some computational time. I then go to the occupancy calculator and notice that if I set my block size at 96, I should be able to get two blocks per MP - expected occupancy 25%. Excellent.

I do this, and then run the profiler. Occupancy 12.5%. Curses. Slowly lowering the maximum number of registers I’m letting it use starts dipping into lmem. When I get it down to 32 I suddenly get the higher occupancy I was expecting earlier (25%). Speed seems about comparable to 16.7% occupancy with 128 threads per block and 47 registers. Increasing occupancy has clearly given me a hefty chunk of speed to counteract that huge amount of local memory I’m now using.

I’m baffled by this outcome. At 32 registers per thread I should be only using 6144 of my 8192 registers. At 42 I should be using all of my 8192 registers (both sums from occupancy calculator… 42 * 96 is a smidgen lower than 8192… but there seems to be a slightly more complicated way to work it out than that).

Why am I not getting 25% occupancy with 42 registers?

E.D_Riedijk · January 17, 2009, 4:01pm

When I fill in 96 threads and 42 registers in the occupancy calculator I get 13% occupancy…

This is the formula calculating the amount of registers required by 1 block:
=CEILING(MyWarpsPerBlock*2; 4)16MyRegCount

So an odd amount of warps per block needs an extra warp worth of registers (and thus is better to avoid), and the minimum is 2 warps worth of registers.

If you use 64 or 192 threads per block, you will get the 25% occupancy (3 or 1 blocks per MP)

Tigga · January 17, 2009, 4:51pm

Righto - I’ll try 64 and 192 and see which is better. Ta.

What version of the calculator are you using? The latest version I can find is 1.4… but given yours seems to be working correctly I imagine there is a newer one.

E.D_Riedijk · January 17, 2009, 6:03pm

This calculation has been there since v1.0 I believe. I am using the one delivered with 2.1 beta, dated 21-06-2008

Topic		Replies	Views
question about register and performance CUDA Programming and Performance	3	6747	September 22, 2008
Occupancy wierdness.... Is the calculator wrong? CUDA Programming and Performance	5	5966	July 25, 2007
Occupancy Calculation in check but still 'out of resource' error. CUDA Programming and Performance	4	3064	November 15, 2009
CUDA Occupancy Calculator Helps pick optimal thread block size CUDA Programming and Performance	76	312603	September 13, 2011
Cuda Occupancy and Register usage CUDA Programming and Performance	6	5897	June 11, 2009
CUDA Occupancy Calculator accuracy? CUDA Programming and Performance	3	7500	March 26, 2007
register pressure CUDA Programming and Performance	5	888	November 17, 2010
Why don't I get full occupancy? CUDA Programming and Performance	0	652	June 15, 2010
A newbie question on Occupancy Calculator CUDA Programming and Performance	0	1815	March 6, 2008
occupancy and performance also a question about .cubin files CUDA Programming and Performance	6	2280	December 9, 2009

Occupancy doesn't tally with calculator

Related topics