CUDA_ERROR_LAUNCH_FAILED for Tesla C2050

Hi !

I have the following ptx-code which I would like to run on GPU:

.version 1.2

.target sm_13

        .entry CompKernel

        {

        .param .u64 C;

        .reg .u64 %o<1>;

        .reg .u64 %p<1>;

LLCompKernel0:

        ld.param.u64 %o0, [C];

        mov.u64 %p0,  %o0;

        st.global.s32 [%p0], 0;

        ret;

        } // CompKernel

In fact, it is an assignment of 0 to the first element of array C given as input parameter.

The above code run successfully on Tesla C1060,

but fails on Tesla C2050 with the message

CUDA_ERROR_LAUNCH_FAILED

In the last case, the code fails on the instruction

st.global.s32 [%p0], 0;

Can anybody explain me what is wrong with my code?

Thanks.

Yury

do you use -arch=sm_20 on C2050?

Yes, I have modified the code slightly to

.version 2.0

.target sm_20

.entry CompKernel      (

        .param .u64 C

 )  {

        .reg .u64 %o<1>;

        .reg .u64 %p<1>;

LLCompKernel0:

         ld.param.u64 %o0, [C];

        mov.u64 %p0,  %o0;

         st.global.s32 [%p0], 0;

        ret;

        } // CompKernel

but the result is the same.