using too much shared memory error

Hi, I have a problem of declaring an array in shared memory. GTX480 has 48k shared memory per block, but when I declare an array of 32k data with each of which 1 byte, it throws out errors as follow when compile.

typedef struct {
char mem : 1;
} bit_t;

shared bit_t data[32768];

ptxas error : Entry function ‘_Z6kernelPcP5bit_tii’ uses too much shared data (0x8018 bytes + 0x10 bytes system, 0x4000 max)

Based on the report, 0x4000 means only 16k for max. Then how do I use the rest part of the shared memory?

But I’ve tried to declare multiple small arrays, like

shared bit_t data1[8092];
shared bit_t data2[8092];
shared bit_t data3[8092];
shared bit_t data4[8092];

then no error reports even when I declare more than 48k…confused

Thanks

I wonder if you need to pass -arch sm_20 to the compiler to get the first case to work…

I wonder if you need to pass -arch sm_20 to the compiler to get the first case to work…

Thanks for the reply. Do you mean I can just pass the architecture version by typing

make arch=sm_20

Thanks for the reply. Do you mean I can just pass the architecture version by typing

make arch=sm_20

No, -arch=sm_20 is a flag to pass to nvcc.

No, -arch=sm_20 is a flag to pass to nvcc.

Then how to set it based on SDK that Nvidia Provides? in common.mk?

I notice that there are lines in common.mk listed below stating the sm_20 and sm_10, but I don’t know how to check which one I am using and how to change it. The card is GTX480.

Compiler-specific flags (by default, we always use sm_10 and sm_20), unless we use the SMVERSION template

GENCODE_SM10 := -gencode=arch=compute_10,code="sm_10,compute_10"

GENCODE_SM20 := -gencode=arch=compute_20,code="sm_20,compute_20"

Then how to set it based on SDK that Nvidia Provides? in common.mk?

I notice that there are lines in common.mk listed below stating the sm_20 and sm_10, but I don’t know how to check which one I am using and how to change it. The card is GTX480.

Compiler-specific flags (by default, we always use sm_10 and sm_20), unless we use the SMVERSION template

GENCODE_SM10 := -gencode=arch=compute_10,code="sm_10,compute_10"

GENCODE_SM20 := -gencode=arch=compute_20,code="sm_20,compute_20"

No idea. I never use the SDK Makefiles as I consider them incomprehensible. :)

No idea. I never use the SDK Makefiles as I consider them incomprehensible. :)

You will already use the most fitting one but as seibert says its better to use your own makefiles. A few lines and you know for sure what CC you are using :) But you might also have to call cudaFuncSetCacheConfig to set smem to 48k and L1 to 16k.

You will already use the most fitting one but as seibert says its better to use your own makefiles. A few lines and you know for sure what CC you are using :) But you might also have to call cudaFuncSetCacheConfig to set smem to 48k and L1 to 16k.