Dynamically allocating memory inside __device/global__ CUDA kernel

I have kernel that compare every line in matrix with first line. Matrix is created dynamically basis on user setting. So at first I try dynamically allocating shared variable but my GPU has Capability 1.1 so I can’t do that.

Is another way to do something like this?

As far as I know, dynamically allocating shared memory has been a feature of CUDA since 1.0.


Could you show me sample code? I try dynamically allocating shared memory but failed.

Dynamic shared memory allocation in CUDA is performed by defining the kernel function as:

__global__ void kernel_function(...)
    extern __shared__ int a[];

You should then pass the number of bytes of shared memory to be allocated as third argument of the kernel launch line


Dynamic shared memory allocation should not be performed like this

__shared__ int a[a_size];

if a_size is unknown at compile-time. Something like

__shared__ int a[100];

should be, instead, fine.