Shared memory problem of above 48 KB requires dynamic shared memory?

Hi everybody!
There have a question about shared memory. My GPU is NVIDIA Geforce RTX 2080 with compute capability 7.5, accroding to the document of CUDA C Programing Guide.pdf, I know Maxium amount of shared memory per thread block is 64 KB and there have a tip that above 48kb requires dynamic shared memory. But I can’t use shared memory between 48kb and 64kb by the way of dynamic shared memory or static shared memory. I don’t know why?
Thank you for your answer!

Blow is my code for testing the dynamic shared memory, but it’s not working!
tip: 90*90*sizeof(double) is approaching 64KB, when modify it to 78*78*sizeof(double), approaching 48 KB, it start working.

static __global__
void Gauss_Jordan_Inverse(double* mat_tmp, int n) {

	int idx = threadIdx.x;
	int idy = blockIdx.x;

	double * mat_tmp1 = mat_tmp + idy * n * n;
	extern __shared__ double mat[];
	Matrix_copy_glob2shr(mat_tmp + idx*n , mat + idx * n, n);

	for (int i = 0; i < n; i++) {
		mat[idx * n + i] += 1;

	Matrix_copy_shr2glob(mat+idx*n, mat_tmp+idx*n, n);

static void fun(double* out ,int size){
    //do something...
    Gauss_Jordan_Inverse << <1, 1,90*90*sizeof(double)>> > (d_out, size);
   //do something...

You’re missing this
See here: Shared memory size per Thread Block

// Host code
int maxbytes = 65536; // 64 KB
cudaFuncSetAttribute(Gauss_Jordan_Inverse, cudaFuncAttributeMaxDynamicSharedMemorySize, maxbytes);

I’d kindly advise to use the forum’s search function before posting.


Thank you very much for your answer and advise. it starts work.

1 Like

It’s very good