When -maxrregcount option is used, kernel fail to run

rainysky · March 18, 2009, 3:16am

Hi everyone,

When I compile my kernel with command:

nvcc.exe -Xptxas=-v -cubin kernel.cu -o test.cubin
kernel.cu
tmpxft_000005a0_00000000-3_kernel.cudafe1.gpu
tmpxft_000005a0_00000000-8_kernel.cudafe2.gpu
ptxas info : Compiling entry function ‘kernel’
ptxas info : Used 28 registers, 28+24 bytes smem, 456 bytes cmem[0], 64 bytes cmem[1]

It uses 28 registers, and the kernel run successfully with correct result.

Then I compile the same source code with “-maxrregcount=16” option:

nvcc.exe -Xptxas=-v -cubin -maxrregcount=16 kernel.cu -o test.cubin
kernel.cu
tmpxft_00000844_00000000-3_kernel.cudafe1.gpu
tmpxft_00000844_00000000-8_kernel.cudafe2.gpu
ptxas info : Compiling entry function ‘kernel’
ptxas info : Used 16 registers, 20+0 bytes lmem, 28+24 bytes smem, 456 bytes cmem[0], 64 bytes cmem[1]

Now only 16 registers are used, but when the kernel is run it output incorrect result.

Anyone help to figure out what is the problem? :blink:

Fugl · March 18, 2009, 1:17pm

Hi everyone,

When I compile my kernel with command:

nvcc.exe -Xptxas=-v -cubin kernel.cu -o test.cubin

kernel.cu

tmpxft_000005a0_00000000-3_kernel.cudafe1.gpu

tmpxft_000005a0_00000000-8_kernel.cudafe2.gpu

ptxas info : Compiling entry function ‘kernel’

ptxas info : Used 28 registers, 28+24 bytes smem, 456 bytes cmem[0], 64 bytes cmem[1]

It uses 28 registers, and the kernel run successfully with correct result.

Then I compile the same source code with “-maxrregcount=16” option:

nvcc.exe -Xptxas=-v -cubin -maxrregcount=16 kernel.cu -o test.cubin

kernel.cu

tmpxft_00000844_00000000-3_kernel.cudafe1.gpu

tmpxft_00000844_00000000-8_kernel.cudafe2.gpu

ptxas info : Compiling entry function ‘kernel’

ptxas info : Used 16 registers, 20+0 bytes lmem, 28+24 bytes smem, 456 bytes cmem[0], 64 bytes cmem[1]

Now only 16 registers are used, but when the kernel is run it output incorrect result.

Anyone help to figure out what is the problem? :blink:

I think you’ve run into a lower limit of how many registers you need for your particular algorithm. If you observe the other numbers, you’d expect local memory usage to increase since you are forcing register to spill into it - but instead they get lower, which could indicate some kind of internal error.

rainysky · March 18, 2009, 2:36pm

I also agree that to reduce the register number from 28 to 16 is not easy, but nvcc simply complete the compilation without reporting any error. And the runtime also report no error.

Just tried the nvcc in 2.1 toolkit, same problem.

CUDA is really powerful and imperfect… :rolleyes:

wumpus · March 18, 2009, 8:18pm

You should make a small test case and report it to NVidia (either on this forum or by submitting a bug report), this is certainly not ‘expected behaviour’. Kernels can become slow if you restrict them to 16 registers, but they should not behave incorrectly.

friedonionrings · April 3, 2009, 3:03am

I too have had an issue with this… My kernel requires 52 registers… If I force this number to be 32 so I can have 50% occupancy. No errors it reportedly runs properly, no more shared memory is used, but the results are drastically different. Some of the numerical values are 75% different. I am using this for scientific research, with an error of 75%, I cannot use cuda.
I would post my code, but it is too long to do so… my kernel is 730 lines, and the code requires an extensive database to run.

Nanthan · May 4, 2010, 9:39pm

The results are incorrect if I don’t specify maxrregcount. If I specify the maxrregcount, the results seem to be OK. Any idea for this behaviour?

Lev · May 4, 2010, 10:46pm

Do you use last version of sdk?

TheOke · February 10, 2011, 10:43am

I have a similar problem.
Compiling for a Tesla C2050 using arch=sm_20, and if I set the maxregcount too low (32) or too high (64), or leave it out, some of my answers are completely wrong.
If I use arch=sm_13, the answers are all correct whether or not I specify the maxregcount.
I’m using the latest cuda toolkit and sdk (Jan 2011).

Is this a known problem?

njuffa · February 10, 2011, 8:53pm

Assuming your code contains no architecture specific code paths, the fact that the code compiled for sm_13 runs fine but fails when compiled for sm_20 suggests (but doesn’t demonstrate conclusively) that there might be a compiler issue. It is not clear whether the code compiled for sm_13 is run on an sm_13 platform while the code compiled for sm_20 is run on an sm_20 platform. If so, please note that sm_2x has much tighter checking for out of bounds accesses. Also, are all CUDA API calls properly checked for error returns? There is always a possibility that it’s not the kernel code itself that gives rise to the unexpected results.

If proper API error checking is in place, and if the code compiled for sm_13 passes, but compiled for sm_20 fails, when running on the same sm_2x GPU, this would be a strong hint that something may be amiss in the compiler. In this case I would encourage you to file a bug report, with a self-contained repro case attached.

Topic		Replies	Views
two questions about maxrregcount parameter of nvcc CUDA Programming and Performance	1	13758	July 27, 2010
Unexpected behavior from maxrregcount CUDA Programming and Performance	2	1547	July 19, 2010
--maxrregcount: sm_10 VS sm_20 If --maxrregcount too low, nvcc aborts for sm_10, continues for sm_20 CUDA Programming and Performance	0	6324	June 8, 2010
register count frustration CUDA Programming and Performance	4	4519	September 29, 2011
`maxrregcount` silently ignored by `nvcc` and `ptxas` CUDA Programming and Performance	19	753	October 21, 2024
Error: ran out of registers CUDA Programming and Performance	9	11714	January 12, 2009
Register usage CUDA Programming and Performance	4	1158	March 13, 2012
Register usage Understanding -ptx and -cubin CUDA Programming and Performance	11	5553	July 24, 2007
Maxrregcount ignored by compiler CUDA Programming and Performance	2	1604	November 16, 2014
Problems with maxrregcount and dynamic parallelism CUDA Programming and Performance	2	873	June 5, 2015

When -maxrregcount option is used, kernel fail to run

Related topics