Problems about #pragma unroll and auto optimization

xjtusnail · March 8, 2009, 2:11pm

In cuda program manu, it announces that #pragma unroll can unroll any given loop.
but when i unroll a loop bigger than 1200, it will report a error: nvopencc ERROR: C:\CUDA\bin/…/open64/lib//be.exe returned non-zero status -1073741819.
At the same time, i use --ptxas-options=-v to check the register using information, and only use 4 register.

but when i unroll loop number less then 1200, it works very well.
Does any one know is there any compiler option can resolve this problem?
and how to disable cuda compiler auto optimization from ptx to cubin?

i use cuda 2.0 driver and toolkit, Geforce 9600GTX.

jack · March 8, 2009, 7:38pm

Are you stuck using CUDA 2.0? If not, perhaps upgrade to the latest (CUDA 2.1); I know they fix compiler bugs with each release (and rumor has it that CUDA 2.2 is due soon). Maybe one of the nVidia employees can check that error code for you.

I don’t think that you can disable the optimization from PTX to the .cubin, because PTX is just an intermediate language; it is interpreted internally in the CUDA tools and then optimized to a .cubin. If they didn’t do this, you would basically have to hand-optimize the PTX for each kernel, which would make writing your kernels in C/C++ pointless. Also, letting the compiler do the optimizations ensures that you are (in most cases) getting the best binary code possible – making a small mistake in the PTX could ruin your coalesced memory accesses, etc.

xjtusnail · March 10, 2009, 1:21am

Are you stuck using CUDA 2.0? If not, perhaps upgrade to the latest (CUDA 2.1); I know they fix compiler bugs with each release (and rumor has it that CUDA 2.2 is due soon). Maybe one of the nVidia employees can check that error code for you.

I don’t think that you can disable the optimization from PTX to the .cubin, because PTX is just an intermediate language; it is interpreted internally in the CUDA tools and then optimized to a .cubin. If they didn’t do this, you would basically have to hand-optimize the PTX for each kernel, which would make writing your kernels in C/C++ pointless. Also, letting the compiler do the optimizations ensures that you are (in most cases) getting the best binary code possible – making a small mistake in the PTX could ruin your coalesced memory accesses, etc.

i have tried with 2.1 version, looks like it also have this problems.

Maybe problem exists in preprocessor of compiler.

Topic		Replies	Views
Problems about #pragma unroll and auto optimization CUDA Programming and Performance	0	3504	March 8, 2009
#Pragma unroll doesn't work? CUDA Programming and Performance	8	6138	September 19, 2008
compiler bug? CUDA Programming and Performance	4	1837	January 13, 2009
Extension cl_nv_pragma_unroll doesn't seem to work CUDA Programming and Performance	4	20224	October 12, 2011
Problem with unrolling loops CUDA Programming and Performance	9	8729	November 24, 2011
#pragma unroll not working? CUDA Programming and Performance	3	5008	June 8, 2009
CUDA #pragma CUDA Programming and Performance	1	1782	July 28, 2013
automatic loop unrolling CUDA Programming and Performance	8	11246	July 2, 2009
Cuda compiler loop unroll bug? CUDA Programming and Performance	14	2641	October 25, 2017
Loop unrolling CUDA Programming and Performance	3	2741	April 25, 2012

Problems about #pragma unroll and auto optimization

Related topics