Hi,
I am doing a simple register and shared memory allocation, but for N=512 it is not working. The device is V100 and I think it should have enough memory space.
double A[N];
__shared__ double s[N];
N=512 & double_size=8b → 512*8=4Kb.
V100-> 256KB register file size /SM
V100 → Up to 96 KB /SM
Grid size is 1 & number of thread is 512 for that kernel.
Should I allocate shared memory statically? Is it faster during execution?
I don’t have any trouble with the code you have shown on a V100. (I’m assuming N
is a compile-time constant.) Perhaps you might want to explain more clearly what you mean by “it is not working”. I usually suggest that people provide a complete example when they are asking questions.
For what its worth, this doesn’t allocate “registers”:
double A[N];
it allocates in the logical local space. The compiler determines register usage.
this:
__shared__ double s[N];
is statically allocated shared memory. Dynamically allocated shared memory uses the extern
keyword, amongst several differences with statically allocated shared syntax.