This is the error that I am getting:
“/tmp/tmpxft_000016d7_00000000-5.i”: Warning: Olimit was exceeded on function _Z25estimate_kernel_optimisedPfS_S_S_S_S_S_S_S_S_S_S_PiS_S_S_f; will not perform function-scope optimization.
To still perform function-scope optimization, use -OPT:Olimit=0 (no limit) or -OPT:Olimit=22672
Assertion failure at line 2385 of …/…/be/cg/NVISA/cgtarget.cxx:
Compiler Error in file /tmp/tmpxft_000016d7_00000000-5.i during Register Allocation phase:
ran out of registers in float
*** glibc detected *** /usr/local/cuda/open64/lib//be: free(): invalid pointer: 0x00000000014d1cc0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3d3b272832]
/lib64/libc.so.6(cfree+0x8c)[0x3d3b275f2c]
/usr/local/cuda/open64/lib//be[0x573d30]
/usr/local/cuda/open64/lib//be[0x597b88]
/lib64/libc.so.6(exit+0x109)[0x3d3b234029]
/usr/local/cuda/open64/lib//be[0x6881da]
/usr/local/cuda/open64/lib//be[0x514a3f]
/usr/local/cuda/open64/lib//be[0x514bed]
/usr/local/cuda/open64/lib//be[0x525920]
/usr/local/cuda/open64/lib//be[0x419bcd]
/usr/local/cuda/open64/lib//be[0x419fbd]
/usr/local/cuda/open64/lib//be[0x41b1f3]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3d3b21e074]
/usr/local/cuda/open64/lib//be[0x417659]
======= Memory map: ========
00400000-0080d000 r-xp 00000000 fd:05 393234 /usr/local/cuda/open64/lib/be
00a0c000-00c5e000 rw-p 0040c000 fd:05 393234 /usr/local/cuda/open64/lib/be
00c5e000-0315d000 rw-p 00c5e000 00:00 0 [heap]
3d3a000000-3d3a01b000 r-xp 00000000 fd:05 35029046 /lib64/ld-2.7.so
3d3a21a000-3d3a21b000 r–p 0001a000 fd:05 35029046 /lib64/ld-2.7.so
3d3a21b000-3d3a21c000 rw-p 0001b000 fd:05 35029046 /lib64/ld-2.7.so
3d3b200000-3d3b34d000 r-xp 00000000 fd:05 35029048 /lib64/libc-2.7.so
3d3b34d000-3d3b54d000 —p 0014d000 fd:05 35029048 /lib64/libc-2.7.so
3d3b54d000-3d3b551000 r–p 0014d000 fd:05 35029048 /lib64/libc-2.7.so
3d3b551000-3d3b552000 rw-p 00151000 fd:05 35029048 /lib64/libc-2.7.so
3d3b552000-3d3b557000 rw-p 3d3b552000 00:00 0
3d3b600000-3d3b682000 r-xp 00000000 fd:05 35029071 /lib64/libm-2.7.so
3d3b682000-3d3b881000 —p 00082000 fd:05 35029071 /lib64/libm-2.7.so
3d3b881000-3d3b882000 r–p 00081000 fd:05 35029071 /lib64/libm-2.7.so
3d3b882000-3d3b883000 rw-p 00082000 fd:05 35029071 /lib64/libm-2.7.so
2aaaaaaab000-2aaaaaaad000 rw-p 2aaaaaaab000 00:00 0
2aaaaaaad000-2aaaaab7a000 r-xp 00000000 fd:05 1803203 /usr/local/matlab/sys/os/glnxa64/libstdc++.so.6.0.8
2aaaaab7a000-2aaaaac7a000 —p 000cd000 fd:05 1803203 /usr/local/matlab/sys/os/glnxa64/libstdc++.so.6.0.8
2aaaaac7a000-2aaaaac9b000 rw-p 000cd000 fd:05 1803203 /usr/local/matlab/sys/os/glnxa64/libstdc++.so.6.0.8
2aaaaac9b000-2aaaaacae000 rw-p 2aaaaac9b000 00:00 0
This happend when manually unrolling a loop. It looks like there is some toolchain error that triggers afterwards. What will change in the generated code when I add -OPT:Olimit=0 to my commandline?
Hmm, I had
for(int k=tid; k< 2048; k+=512)
{ sdata[k] += input1[k] * input2[k] + input1[k+256] * input2[k+256];
mdata[k] = fminf(mdata[k], fminf(input1[k], input1[k + 256]));
}
turned it into 1 statement. Got the error.
Then change the code into :
sdata[k] += input1[k] * input2[k] + input1[k+256] * input2[k+256];
sdata[k] += input1[k+512] * input2[k+512] + input1[k+768] * input2[k+768];
sdata[k] += input1[k+1024] * input2[k+1024] + input1[k+1280] * input2[k+1280];
sdata[k] += input1[k+1536] * input2[k+1536] + input1[k+1792] * input2[k+1792];
(eq. for the fminf)
And still get the error. Now I am really puzzled…