kernel function size limit? how many lines or variables are allowed?

demaxism · November 15, 2007, 4:39am

hello, i’ve come up to a problem of running out of registers( maybe )
I tried to make a kernel function about over than 1000 lines, with over one hundred variables in it , than the NVCC reports “Olimit was exeeded on function my_func; will not perform function-scope optimization”…“ran out of registers in predicate”…
btw, i didn’t use any type qualifiers for any variable, does it means all the variable are stored in register?

Is there any specification about how long can a kernel function be ?

ding

AndreiB · November 15, 2007, 6:26am

Maximum kernel size is around 2 million hardware instructions (according to Programming Manual). However, you’ll run out of registers much earlier (exactly what you’ve got).

You should try to redesign your kernel so that it uses less registers (compiler always tries to use registers where possible). This may be done by using shared or lcal(slow!) memory or by splitting big kernel into several smaller ones.

asadafag · November 15, 2007, 6:52am

There is a much lower kernel size limit imposed by ptxas, something around 32767. However, I don’t know whether this gets fixed in 1.1.

AndreiB · November 15, 2007, 7:23am

32767 instructions?

wumpus · November 15, 2007, 11:32am

Yes, you can run out of ‘virtual registers’ very soon, as it uses a new ‘virtual register’ for each assignment. This is a ptx limitation, not a CUDA one.

The real limit is said to be at 2Mb of shader instructions, which is 262144 64-bit instructions. (and ometimes two instructions can be stored in one 64 bit word)

demaxism · November 15, 2007, 11:48am

Thanks AndreiB, it was really such a fat code than i also guess the gpu cannot deal with.

but I took a test: the code has about 200 lines for variables declaration and 800 lines for calculation, i cut 400 lines of the 800 lines’ calculation, then the compiling passed, only reports “compiler may run out of memory or run very slowly for large Olimit values”(it took 5 seconds). So i wander the pure calculation also consumes extra registers, right?

AndreiB · November 15, 2007, 12:03pm

Not GPU, but compiler. I currently have similar problem and it seems I’ll be translating my code to PTX by hand. May this solve problem?

Yes, I’ve seen examples of this. You may try putting if( blockIdx.x < 0 ) { __syncthreads(); } somewhere as this sometimes reduces number of registers used by compiler :)

MisterAnderson42 · November 15, 2007, 1:25pm

The compiler aggressively optimizes out dead code. If an entire long kernel (that doesn’t use smem) only results in a single global memory write, commenting out that write will cause the compiler to optimize away the entire kernel and leave you with a blank one. After you commented out 400 lines of code, nvcc probably optimized away a lot of the variables used.

Topic		Replies	Views
Number of variables limit? within a kernel CUDA Programming and Performance	6	1946	November 16, 2009
Is it possible to use more than 124 registers in kernel? CUDA Programming and Performance	15	4294	October 16, 2009
CUDA kernel size What if it exceeds 2MB CUDA Programming and Performance	4	3879	November 5, 2007
two questions about maxrregcount parameter of nvcc CUDA Programming and Performance	1	13758	July 27, 2010
Use of register An odd problem CUDA Programming and Performance	12	2474	August 12, 2010
Is there a way to find out the number of registers in a kernel program? CUDA Programming and Performance	2	2295	December 7, 2007
Error: ran out of registers CUDA Programming and Performance	9	11720	January 12, 2009
Kernel code size limitations CUDA Programming and Performance	2	4614	March 9, 2007
Registers and threads CUDA Programming and Performance	5	5354	March 20, 2008
Handling resources CUDA Programming and Performance	1	2247	May 12, 2008

kernel function size limit? how many lines or variables are allowed?

Related topics