NVIDIA Developer Forums

issue using cudaFuncSetCacheConfig setting cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShare

Accelerated Computing CUDA CUDA Programming and Performance

Joe_Fatmama November 16, 2010, 3:40pm 1

Hello,

Using the command cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared) in conjunction with allocating about 38k worth of shared memory produces the following compile error:

ptxas error : Entry function ‘_Z12MyKernelPlPfS0_S0_iiilli’ uses too much shared data (0x88f8 bytes + 0x10 bytes system, 0x4000 max)

According to this error log, the max shared memory allowed is 16k, rather than 48k specified with CacheConfig command. The default setting also allocates 48k to shared. So it is a bit confusing.

Is the problem related to the amount of L1 memory that’s required by the kernel? I understand that the system will allocate the necessary mem to L1 regardless fo the specified preference. I suppose one way to verify this is to simplify kernel logic and try the Cache setting anew. Is there any other way to determine what’s causing this problem?

Thanks in advance, Joe.

Joe_Fatmama November 16, 2010, 3:40pm 2

Hello,

Using the command cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShared) in conjunction with allocating about 38k worth of shared memory produces the following compile error:

ptxas error : Entry function ‘_Z12MyKernelPlPfS0_S0_iiilli’ uses too much shared data (0x88f8 bytes + 0x10 bytes system, 0x4000 max)

According to this error log, the max shared memory allowed is 16k, rather than 48k specified with CacheConfig command. The default setting also allocates 48k to shared. So it is a bit confusing.

Is the problem related to the amount of L1 memory that’s required by the kernel? I understand that the system will allocate the necessary mem to L1 regardless fo the specified preference. I suppose one way to verify this is to simplify kernel logic and try the Cache setting anew. Is there any other way to determine what’s causing this problem?

Thanks in advance, Joe.

Topic		Replies	Views	Activity
Shared Mem size and Shared vs L1 Cache memory CUDA Programming and Performance	9	2147	November 17, 2010
Reconfiguring the cache / shared memory on a Fermi understanding the cudaFuncSetCacheConfig command CUDA Programming and Performance	19	34911	June 7, 2010
How to use cudaFuncSetCacheConfig() correctly ? One of the most anticipating features does not seem CUDA Programming and Performance	8	5691	June 23, 2010
What's the take on cudaFuncSetCacheConfig() these days? CUDA Programming and Performance	1	424	August 28, 2022
"cudaDeviceSetSharedMemConfig" and "cudaDeviceSetCacheConfig" has no effect CUDA Programming and Performance	3	1303	October 8, 2018
cudaFuncSetCacheConfig - call overhead CUDA Programming and Performance	1	734	November 5, 2010
How to setup "function cache configuration"? Nsight Compute	2	740	July 22, 2024
cudaFuncSetCacheConfig( Kernel1, cudaFuncCachePreferL1) No effect on shared memory CUDA Programming and Performance	1	1936	January 30, 2012
how to use 48k-shared memory in gtx480? CUDA Programming and Performance	25	9126	February 18, 2011
How the shared memory work with cudaFuncSetAttribute? CUDA Programming and Performance	2	114	March 27, 2025