Hi folks. I’m having problem with a store instrucion, I’m running rodinia’s kmeans with a devcode to run different ptx compilations. The code below:
ld.param.u64 %r2, [_Z11kmeansPointPfiiiPiS_S_S0__param_4]; cvta.to.global.u64 %r3, %r2; st.local.s64 [ocelot_ls_stack + 4], %r3; ....... ld.local.s64 %r151, [ocelot_ls_stack + 4]; add.s64 %r132, %r151, %r131; //add.s64 %r132, %r3, %r131;
When the ptx, in the devcode directory contains the “st.local” instruction, I get the error:
Cuda kernel kmeansPointexecution error: unspecified launch failure
It is just the st instruction, if I comment it, leave the ld and uncomment the last add, it doesn’t give an error, although I have to replace the loaded value with the value in %r3. Is not the array declaration, once other variables as being spilled and loaded from it.
ptxas compiles it, but using the binary file inside de devcode gives the same error.
Specs: NVCC 4.1, arch linux 64 bits
Is it a nvcc / cuda bug?
Thx in advance
Found the problem, st.64 must be 8 bytes aligned.