Allocate dynamic register and shared memory


I am doing a simple register and shared memory allocation, but for N=512 it is not working. The device is V100 and I think it should have enough memory space.

double A[N]; 
__shared__ double s[N];

N=512 & double_size=8b → 512*8=4Kb.

V100-> 256KB register file size /SM
V100 → Up to 96 KB /SM

Grid size is 1 & number of thread is 512 for that kernel.

Should I allocate shared memory statically? Is it faster during execution?

I don’t have any trouble with the code you have shown on a V100. (I’m assuming N is a compile-time constant.) Perhaps you might want to explain more clearly what you mean by “it is not working”. I usually suggest that people provide a complete example when they are asking questions.

For what its worth, this doesn’t allocate “registers”:

double A[N];

it allocates in the logical local space. The compiler determines register usage.


__shared__ double s[N];

is statically allocated shared memory. Dynamically allocated shared memory uses the extern keyword, amongst several differences with statically allocated shared syntax.