Hi all,
I know the static allocation size is limited to 48KB in one block, but A100 has 164KB shared memory on one SM. I tried to use dynamic allocation to allocate more than 48KB shared memory. The compilation is fine, but it throw an error CUDA error: invalid argument
when I allocate more than 48KB using dynamic shared memory allocation. What is the proper way to do this allocation?
Thank you
Related CUDA code looks like this:
#include<cstdio>
__global__ void sharedMemTest()
{
__shared__ int _ss[1024];
extern __shared__ int _s[];
if (threadIdx.x==0)
printf("blockIdx.x is %d s is at %x, ss is at %x\n", blockIdx.x, _s+10, _ss);
}
int main()
{
dim3 block(32);
dim3 grid(32);
sharedMemTest << <grid, block, 44*1024+1>> >();
cudaError_t error = cudaGetLastError();
printf("CUDA error: %s\n", cudaGetErrorString(error));
cudaDeviceSynchronize();
}
Problem solved. Thank you very much