Hi everybody!
There have a question about shared memory. My GPU is NVIDIA Geforce RTX 2080 with compute capability 7.5, accroding to the document of CUDA C Programing Guide.pdf, I know Maxium amount of shared memory per thread block is 64 KB and there have a tip that above 48kb requires dynamic shared memory. But I can’t use shared memory between 48kb and 64kb by the way of dynamic shared memory or static shared memory. I don’t know why?
Thank you for your answer!
1 Like
Blow is my code for testing the dynamic shared memory, but it’s not working!
tip: 90*90*sizeof(double)
is approaching 64KB, when modify it to 78*78*sizeof(double),
approaching 48 KB, it start working.
static __global__
void Gauss_Jordan_Inverse(double* mat_tmp, int n) {
int idx = threadIdx.x;
int idy = blockIdx.x;
double * mat_tmp1 = mat_tmp + idy * n * n;
extern __shared__ double mat[];
Matrix_copy_glob2shr(mat_tmp + idx*n , mat + idx * n, n);
for (int i = 0; i < n; i++) {
mat[idx * n + i] += 1;
}
__syncthreads();
Matrix_copy_shr2glob(mat+idx*n, mat_tmp+idx*n, n);
}
static void fun(double* out ,int size){
//do something...
Gauss_Jordan_Inverse << <1, 1,90*90*sizeof(double)>> > (d_out, size);
//do something...
}
1 Like
You’re missing this
See here: Shared memory size per Thread Block
// Host code
int maxbytes = 65536; // 64 KB
cudaFuncSetAttribute(Gauss_Jordan_Inverse, cudaFuncAttributeMaxDynamicSharedMemorySize, maxbytes);
I’d kindly advise to use the forum’s search function before posting.
3 Likes
Thank you very much for your answer and advise. it starts work.
1 Like
It’s very good