Question about the cacheConfig value in nsight systems

Hello!

From this guide, it seems that the cacheConfig value reported by NSight Systems should correspond to one of the ENUM_CUDA_FUNC_CACHE_CONFIG values. When I use sqlite from the nsys reports, I see that ENUM_CUDA_FUNC_CACHE_CONFIG has the following values:

[(0, ‘CU_FUNC_CACHE_PREFER_NONE’, ‘None’), (1, ‘CU_FUNC_CACHE_PREFER_SHARED’, ‘Shared’), (2, ‘CU_FUNC_CACHE_PREFER_L1’, ‘L1’), (3, ‘CU_FUNC_CACHE_PREFER_EQUAL’, ‘Equal’)]

However, when I use sqlite to get the cacheConfig, some kernels show a value of 4.Could you please explain why this is happening, or if I am missing something?

Thank you!

@jkreibich can you help?

The enum values come directly from the cuda.h header, and in the latest version I have access to, only the values 0 to 3 are defined. I’m not aware of any other values being used as a flag value or special value by Nsight Systems or the exporter.

This is a configuration value, and can be set by the CUDA application for kernel launches. Do you know if the application you’re profiling attempts to set the cache configuration, and if so, what value it is set to?

It would also help to know the version of CUDA being used, as well as the version of Nsight Systems.

-j

Thanks a lot for your answer!

Yes, also my understanding is that these values are taken from cuda.h (according to some older documentation), so I am not sure where this 4 came from.

I originally observed this when running a torch.mm example (which ends up launching a cutlass kernel) :

import torch
mat1 = torch.randn(2048, 2048).cuda()
mat2 = torch.randn(2048, 2048).cuda()
for i in range(10):
   torch.mm(mat1, mat2)

I then experimented with very simple custom kernels, where I was changing cache config with either cudaFuncSetCacheConfig or using cudaFuncAttributePreferredSharedMemoryCarveout. I observed that when I was setting the cache config to equal (or the carveout to 50), I would see the value 4 - but since it is not in the acceptable values, I am not sure if this is intentional.

The versions I am using are:

  • NSight: 2024.4.2
  • CUDA: 12.6
  • Driver: 560

I have seen the same behavior on both RTX-3090 and H100 (with CUDA 12.9 and Nsys 2025.1.3).

hello, are there any updates on this?

In cupti_activity.h for the field cacheConfig it states

/**

  • For devices with compute capability 7.5+ cacheConfig values are not updated
  • in case field isSharedMemoryCarveoutRequested is set
    */

I have not found a location in the driver or CUPTI that sets the value to outside of the enum (0-3) range. There is a lot of conversion code so I cannot state that it is not being set in one of the layers at this time.

1 Like