Suppose we have something like
unsigned integer i
unsigned integer j
unsigned integer x[8] ← sits in local memory
How can we do
x[i]=j
in inline PTX?
Suppose we have something like
unsigned integer i
unsigned integer j
unsigned integer x[8] ← sits in local memory
How can we do
x[i]=j
in inline PTX?
I don’t know the exact syntax for inline PTX, but you want to use the ‘st’ instruction with the correct modifiers: ‘st.local.u32’. I’d guess it’d be something like: ‘st.local.u32 x[%i], %j’
Hope that gets you on the right track. If not, the PTX spec is included in the ‘doc’ folder in the CUDA toolkit; you should be able to find the answer in there.
Your code may need to distinguish between sm_1x and sm_2x. For sm_1x, you want to use st.local as Jack points out, however for sm_2x you want to use a generic store pursuant to the section “Memory Space Conflicts” in the manual “Using Inline PTX Assembly in CUDA”.