Hi, I have a problem of declaring an array in shared memory. GTX480 has 48k shared memory per block, but when I declare an array of 32k data with each of which 1 byte, it throws out errors as follow when compile.
typedef struct {
char mem : 1;
} bit_t;
shared bit_t data[32768];
ptxas error : Entry function ‘_Z6kernelPcP5bit_tii’ uses too much shared data (0x8018 bytes + 0x10 bytes system, 0x4000 max)
Based on the report, 0x4000 means only 16k for max. Then how do I use the rest part of the shared memory?
But I’ve tried to declare multiple small arrays, like
Then how to set it based on SDK that Nvidia Provides? in common.mk?
I notice that there are lines in common.mk listed below stating the sm_20 and sm_10, but I don’t know how to check which one I am using and how to change it. The card is GTX480.
Compiler-specific flags (by default, we always use sm_10 and sm_20), unless we use the SMVERSION template
Then how to set it based on SDK that Nvidia Provides? in common.mk?
I notice that there are lines in common.mk listed below stating the sm_20 and sm_10, but I don’t know how to check which one I am using and how to change it. The card is GTX480.
Compiler-specific flags (by default, we always use sm_10 and sm_20), unless we use the SMVERSION template
You will already use the most fitting one but as seibert says its better to use your own makefiles. A few lines and you know for sure what CC you are using :) But you might also have to call cudaFuncSetCacheConfig to set smem to 48k and L1 to 16k.
You will already use the most fitting one but as seibert says its better to use your own makefiles. A few lines and you know for sure what CC you are using :) But you might also have to call cudaFuncSetCacheConfig to set smem to 48k and L1 to 16k.