How to allocate shared memory?


I’m working on code in which I want to use dynamically allocated shared memory. However, I get an exception when I try to copy something into it. THe Programming Guide and SDK examples don’t help me, I don’t know what I’m doing wrong…

Here’s the code:

__global__ void

My_kernel (	double	* d_dk, RadialSamples, ... )


	extern __shared__ float shared[];


	// load the vector dk from global to shared memory

	float		* s_dk			= (float*) shared;


	if ( threadIdx.x < RadialSamples)


		s_dk[threadIdx.x] = (float) d_dk[threadIdx.x];




	// later, I want to load some other vector

	float		* s_direction	= (float*) &s_dk[RadialSamples]



Even the first bit doesn’t work. Am I misunderstanding the programming guide?

What confuses me is that I never tell how many elements I want to store to the shared memory…

Can anyone help me?

You have to pass the amount of dynamically allocated shared memory as an argument to the kernel call, so the kernel call becomes

My_kernel<<< gridsize, blocksize, sizeofsharedmemory >>> ( d_dk, RadialSamples, ...);

That explains a lot! Thanks!!!

PS: Why don’t they write this in the Programming Guide in the shared-memory section???!!!