hello, guys, i had a trouble since last year, i write a cuda application, it works fine when the scale is small, it goes wrong when the scale become bigger, the console said: “too much shared memory, more than 16k”, but my device is GTX480 with 48k shared memory, i try to compile with sm_20, it didn’t work, is there anyone have the same problem? please someone help me out. thanks
Make sure that it’s set to 48 KB shared and 16 KB L1 cache. Do this by passing the cudaFuncCachePreferShared enum to cudaFuncSetCacheConfig(…). Check out G.4.1 in the programming guide.