Hi !
I have the following ptx-code which I would like to run on GPU:
.version 1.2
.target sm_13
.entry CompKernel
{
.param .u64 C;
.reg .u64 %o<1>;
.reg .u64 %p<1>;
LLCompKernel0:
ld.param.u64 %o0, [C];
mov.u64 %p0, %o0;
st.global.s32 [%p0], 0;
ret;
} // CompKernel
In fact, it is an assignment of 0 to the first element of array C given as input parameter.
The above code run successfully on Tesla C1060,
but fails on Tesla C2050 with the message
CUDA_ERROR_LAUNCH_FAILED
In the last case, the code fails on the instruction
st.global.s32 [%p0], 0;
Can anybody explain me what is wrong with my code?
Thanks.
Yury