Occupancy Calculation in check but still 'out of resource' error.

m3the01 · November 14, 2009, 10:20pm

There is a small problem that troubles one of my code bases, the code functions fine but my understanding is lacking so i thought i would finally try to sort this 1.5 year old problem out.

I have some calculation for computing magnetic and electric fields in cuda for a while,

The ptax output is as follows,

ptxas info : Compiling entry function ‘_Z17integrateParticleP6float4S0_fj’
ptxas info : Used 47 registers, 56+28 bytes lmem, 32+28 bytes smem, 24 bytes cmem[0], 160 bytes cmem[1]

So taking the floor(8192/47) = 174, and taking into account warps, i should be able to run this code using a block size of 160 threads, however 128 is the max size i can run.

Could someone please explain this to me, my shared memory is also okay so im really lost on the ‘justification’.

Thanks for the help,

m3the01 · November 14, 2009, 10:35pm

I happent to download the occupancy calculator again and noticed when using a blocksize of 160threads, my register count is 9216, could someone explain why its not simple 160*47 = 7520 ?

Thanks,

Bear21 · November 15, 2009, 12:46pm

I believe it has to do with the instruction unit, try to keep your threads in multiples of 2, but most of all a multiple of 64 (you need to register in blocks of 64 threads at a time).

for what your asking for, i think you’d do best by finding 5 registers you arn’t using much and putting them into local memory, and running 192 threads with 42 registers, 192*42 = 8064, this will work with 192 threads.

even with a wait time of 400-600cycles every time you call the local memory with 192 threads as long as your not using __syncthreads(); to much, it will be fine, and you’l recieve almose no performance loss.

your only other alternative is 128 threads.

Edit: actually you might recieve a bit of speed loss with only 192 threads, is it possible to use shared memory for thoughs 5 registers?

m3the01 · November 15, 2009, 3:23pm

Where does it state this in the cuda manual? I could see the full warp logic of 160 threads, because of 32 threads in a warp. I dont see anywhere where 64 would be an issue. Could u explain this more? Thanks!!!

PS i could for sure use a little shared memory, thanks for the input.

Bear21 · November 15, 2009, 5:24pm

NVIDIA_CUDA_ProgrammingGuide_2.3.pdf section : 5.1.2.6 Registers

[codebox]5.1.2.6 Registers

Generally, accessing a register is zero extra clock cycles per instruction, but delays

may occur due to register read-after-write dependencies and register memory bank

conflicts.

The delays introduced by read-after-write dependencies can be ignored as soon as

there are at least 192 active threads per multiprocessor to hide them.

The compiler and thread scheduler schedule the instructions as optimally as possible

to avoid register memory bank conflicts. They achieve best results when the number

of threads per block is a multiple of 64. Other than following this rule, an

application has no direct control over these bank conflicts. In particular, there is no

need to pack data into float4 or int4 types[/codebox]

your welcome :)

Topic		Replies	Views
Occupancy doesn't tally with calculator CUDA Programming and Performance	3	1713	January 17, 2009
A newbie question on Occupancy Calculator CUDA Programming and Performance	0	1829	March 6, 2008
question about register and performance CUDA Programming and Performance	3	6800	September 22, 2008
Maximal threads per block calculation Calc based in reg and shared mem usage.. CUDA Programming and Performance	7	5114	June 30, 2008
Registers per thread limit and occupancy CUDA Programming and Performance	3	10202	March 30, 2007
CUDA Occupancy Calculator Helps pick optimal thread block size CUDA Programming and Performance	76	313076	September 13, 2011
regsPerBlock CUDA Programming and Performance	4	2598	September 28, 2008
Occupancy wierdness.... Is the calculator wrong? CUDA Programming and Performance	5	6017	July 25, 2007
max number of block CUDA Programming and Performance	21	18220	April 20, 2010
Cuda Occupancy and Register usage CUDA Programming and Performance	6	6002	June 11, 2009

Occupancy Calculation in check but still 'out of resource' error.

Related topics