What's the take on cudaFuncSetCacheConfig() these days?

dscerutti · August 28, 2022, 8:44pm

I’ve decided that L1 management is about three quarters of my job. I have some kernels that are pushing the limits of 128kB of L1 supply, and I have a choice as to whether to stuff the data into arrays in shared or let it be L1. In most cases, if I’m only reading the data, I am trying to condense it into a handful of arrays that will get pre-fetched into L1 (the data may begin with a degree of scattering in global memory, so it then gets compacted and ordered into a block-specific set of arrays, which are also in global memory but exclusive to one thread block so __syncthreads(); is an effective guard against race conditions).

But, I recall in the early days there was cudaFuncSetCacheConfig(), which altered the partition of the physical transistors between mostly shared or mostly L1. My question is, given the existence of that API, is there always a region of the L1 that is roped off to be __shared__ memory, even if it’s only 1/4 of the usual 128kB? Put another way, if my kernel only utilized 13 kB of __shared__ memory, will there be 115kB of L1 available, or at most 96kB in an automatically inferred “prefer L1” kernel configuration? When my kernels are brushing up against the 128kB cache limits, I can probably modulate what goes into shared and what goes into global pre-fetched arrays, but I’m curious if I need to worry about this at all.

Robert_Crovella · August 28, 2022, 9:59pm

this may be of interest (for cc8.x devices)

Topic		Replies	Views
changing L1 cache configuration using â€œcudaFuncSetCacheConfig" not working CUDA Programming and Performance	6	4587	February 3, 2012
How to use cudaFuncSetCacheConfig() correctly ? One of the most anticipating features does not seem CUDA Programming and Performance	8	5724	June 23, 2010
issue using cudaFuncSetCacheConfig setting cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferShare CUDA Programming and Performance	1	975	November 16, 2010
APIs for splitting shared memory and L1 cache CUDA Programming and Performance	3	649	May 29, 2024
New cudaDeviceSetCacheConfig and cudaFuncSetCacheConfig mode CUDA Programming and Performance	2	3641	April 22, 2013
cudaFuncSetCacheConfig - call overhead CUDA Programming and Performance	1	737	November 5, 2010
Reconfiguring the cache / shared memory on a Fermi understanding the cudaFuncSetCacheConfig command CUDA Programming and Performance	19	34952	June 7, 2010
L1 data cache/shared memory size in Volta architecture CUDA Programming and Performance	4	1960	February 13, 2020
Fermi: Cache configuration default at compile time From shared to L1 CUDA Programming and Performance	4	19613	April 16, 2010
cudaFuncSetCacheConfig( Kernel1, cudaFuncCachePreferL1) No effect on shared memory CUDA Programming and Performance	1	1939	January 30, 2012

What's the take on cudaFuncSetCacheConfig() these days?

Related topics