From this guide, it seems that the cacheConfig value reported by NSight Systems should correspond to one of the ENUM_CUDA_FUNC_CACHE_CONFIG values. When I use sqlite from the nsys reports, I see that ENUM_CUDA_FUNC_CACHE_CONFIG has the following values:
However, when I use sqlite to get the cacheConfig, some kernels show a value of 4.Could you please explain why this is happening, or if I am missing something?
Yes, I have access to these kernels and I can set the shared memory preference with cudaFuncSetCacheConfig or cudaFuncSetAttribute/cudaFuncAttributePreferredSharedMemoryCarveout
Ideally, I want to check if NSYS can help with also kernels that I don’t have access to, this is why I am checking with some custom setups first.
I am not sure how to get the func cache config that was actually used for the kernel using the CUDA API, do you have any recommendations?
Thank you! For the nsight systems value, do you know if this is the ‘preferred’ or the ‘actual’ cache config?
Also, I see that nsight systems has a sharedMemoryExecuted metric, which seems to be: Shared memory size set by the driver (from the guide). Does this mean that this is the value that the driver carved out for shared memory in each SM? (so the remaining will be L1?)