NVCC / CUDA BUG? PTX: st Run error when try to st a value generated with cvta.

Hi folks. I’m having problem with a store instrucion, I’m running rodinia’s kmeans with a devcode to run different ptx compilations. The code below:

ld.param.u64 %r2, [_Z11kmeansPointPfiiiPiS_S_S0__param_4];

cvta.to.global.u64 %r3, %r2;

st.local.s64 [ocelot_ls_stack + 4], %r3;

.......

ld.local.s64 %r151, [ocelot_ls_stack + 4];

add.s64 %r132, %r151, %r131;

//add.s64 %r132, %r3, %r131;

When the ptx, in the devcode directory contains the “st.local” instruction, I get the error:

Cuda kernel kmeansPointexecution error: unspecified launch failure

It is just the st instruction, if I comment it, leave the ld and uncomment the last add, it doesn’t give an error, although I have to replace the loaded value with the value in %r3. Is not the array declaration, once other variables as being spilled and loaded from it.

ptxas compiles it, but using the binary file inside de devcode gives the same error.

Specs: NVCC 4.1, arch linux 64 bits

Is it a nvcc / cuda bug?

Thx in advance

Diogo Sampaio


Found the problem, st.64 must be 8 bytes aligned.