As is well known, the default size of shared memory in GTX480 is 48KB, with L1 cache 16K. But my app running on GTX480 with more than 16KB shared memory used causes errors like this :
uses too much shared data (0xa09c bytes + 0x10 bytes system, 0x4000 max)
The max value is 0x4000, just 16KB… Then I use CUDA Runtime Function–cudaFuncSetCacheConfig to force the shared memory to be 48KB for my kernel function. But of no use…
Is there anyone have encountered the same problem ?