Problems about #pragma unroll and auto optimization

In cuda program manu, it announces that #pragma unroll can unroll any given loop.
but when i unroll a loop bigger than 1200, it will report a error: nvopencc ERROR: C:\CUDA\bin/…/open64/lib//be.exe returned non-zero status -1073741819.
At the same time, i use --ptxas-options=-v to check the register using information, and only use 4 register.

but when i unroll loop number less then 1200, it works very well.
Does any one know is there any compiler option can resolve this problem?
and how to disable cuda compiler auto optimization from ptx to cubin?

i use cuda 2.0 driver and toolkit, Geforce 9600GTX.

Are you stuck using CUDA 2.0? If not, perhaps upgrade to the latest (CUDA 2.1); I know they fix compiler bugs with each release (and rumor has it that CUDA 2.2 is due soon). Maybe one of the nVidia employees can check that error code for you.

I don’t think that you can disable the optimization from PTX to the .cubin, because PTX is just an intermediate language; it is interpreted internally in the CUDA tools and then optimized to a .cubin. If they didn’t do this, you would basically have to hand-optimize the PTX for each kernel, which would make writing your kernels in C/C++ pointless. Also, letting the compiler do the optimizations ensures that you are (in most cases) getting the best binary code possible – making a small mistake in the PTX could ruin your coalesced memory accesses, etc.

i have tried with 2.1 version, looks like it also have this problems.

Maybe problem exists in preprocessor of compiler.