Document explains that cc 7.0 can support 96KB shared memory per thread block.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications
But my simple test code could not invoke a kernel with 49KB shared memory. (It’s OK up to 48KB)
Test was done on Titan V and the output of cuobjdump looks good.(arch = sm_70)
What’s the problem?
#include <stdio.h>
__global__ void
test()
{
}
int
main(void)
{
dim3 dimBlock(1,1);
dim3 dimGrid(1,1);
cudaError err;
test<<<dimGrid, dimBlock, 49 * 1024>>>();
err = cudaGetLastError();
if (err != cudaSuccess) {
printf("error: %s\n", cudaGetErrorString(err));
}
}
Following is cuobjdump output for my test executable.
Fatbin ptx code:
================
arch = sm_70
code version = [6,4]
producer = <unknown>
host = linux
compile_size = 64bit
compressed