Hello,
I’m working on code in which I want to use dynamically allocated shared memory. However, I get an exception when I try to copy something into it. THe Programming Guide and SDK examples don’t help me, I don’t know what I’m doing wrong…
Here’s the code:
__global__ void
My_kernel ( double * d_dk, RadialSamples, ... )
{
extern __shared__ float shared[];
// load the vector dk from global to shared memory
float * s_dk = (float*) shared;
if ( threadIdx.x < RadialSamples)
{
s_dk[threadIdx.x] = (float) d_dk[threadIdx.x];
}
__syncthreads();
...
// later, I want to load some other vector
float * s_direction = (float*) &s_dk[RadialSamples]
...
}
Even the first bit doesn’t work. Am I misunderstanding the programming guide?
What confuses me is that I never tell how many elements I want to store to the shared memory…
Can anyone help me?