Only 16Kbyte shared memory on GTX480

Hi there,

Fermi has up to 48K Byte shared memory.
However, I can only allocate a bit less than 16K Byte on my GTX480 card.
More than that I will receive the following error message:
ptxas error : Entry function ‘Z7kernel1P7double2S0’ uses too much shared data (0x4010 bytes + 0x10 bytes system, 0x4000 max)

nvcc version: 3.1
CUDA driver version: 256.40
compile command:
nvcc -arch sm_13 test_double.cu -o test_double -L/usr/local/cuda/lib64 -lm -lcuda

A Toy Kernel:
#define M 2048
global void kernel1( double2 * ro)
{

int tx = threadIdx.x;
int bx = blockIdx.x;
int d = blockDim.x;
int i;
shared double2 a[M];
for(i=0;i<=M/d;i++)
{
a[i*d+tx].x=1;
a[i*d+tx].y=2;
ro[bxM+id+tx].x = a[i*d+tx].x;
ro[bxM+id+tx].y = a[i*d+tx].y;
}

}

Does anyone see what is the problem?

compile with -arch sm_20

compile with -arch sm_20

Thanks a lot!

It seems I have to specify -arch sm_20 during compiling.

I don’t understand why nvcc can’t recognize the compute capability automatically.

Thanks a lot!

It seems I have to specify -arch sm_20 during compiling.

I don’t understand why nvcc can’t recognize the compute capability automatically.