cudaConfigureCall Setting up shared memory

Hello everyone,

A little question here.

Making a code in CUDA, I needed a variable shared memory size. I believed I couldn’t do that, but the function cudaConfigureCall then dazzled me.

I don’t quite know how to work with that function, specifically how to setup the shared memory size.

So my question is, can I use cudaConfigureCall to call a kernel with a variable-depending shared memory size, instead of a constant shared memory size?

Just use the third parameter to the kernel invocation:

kernel<<<grid, threads, num_extern_shared_bytes>>>(args)

Thanks, MisterAnderson, but how does that work in the called function?

For example, how would I change this function to use that parameter in the variable “M”:

__global__ static void kernel(args){

__shared__ short M[MAX_SIZE];

...

}

Do this:

__global__ static void kernel(args){

extern __shared__ short shmem[];

shmem[0] = ...;

...

}

If you need to know the size of the shared memory in the kernel you can pass it in as an argument, but in many cases the size you need is a natural consequence of the number of threads in a block, so you don’t need to pass it in.