In cuda program manu, it announces that #pragma unroll can unroll any given loop.
but when i unroll a loop bigger than 1200, it will report a error: nvopencc ERROR: C:\CUDA\bin/…/open64/lib//be.exe returned non-zero status -1073741819.
At the same time, i use --ptxas-options=-v to check the register using information, and only use 4 register.
but when i unroll loop number less then 1200, it works very well.
- Does any one know is there any compiler option can resolve this problem?
From ptx code, compiler always will auto optimize your code, for examples, delete some unusing resultes related code or rearrage your code order. so i want to know:
2. how to disable cuda compiler auto optimization from ptx to cubin?
i use cuda 2.0 driver and toolkit, Geforce 9600GTX.