Question over shared memory configuration in Nvidia Visual Profiler


I’m using the V100s to run some of my applications and wanted to profile them, however I noticed that when dumping the output to a log file and then importing it into Nvidia Visual Profiler the shared memory configuration is reading:

Shared Memory Requested: 96 KiB
Shared Memory Executed: 96 KiB
Shared Memory Bank Size: 4B

As I understand it the V100’s come with configurable memory split between L1 and shared memory, and I know this application is not using shared memory so I’m quite eager to have it all go to L1. But I’ve already tried cudaFuncSetAttribute() and cudaFuncSetCacheConfig() with no luck, it is still reporting the same numbers above for shared memory configuration. Am I using the wrong functions to configure the device? Am I misunderstanding what Nvidia Visual Profiler is showing me?

What version of NVIDIA Visual Profiler are you using?

cudaFuncSetAttribute(kernel, cudaFuncAttributePreferredSharedMemoryCarveout, cudaSharedmemCarveoutMaxL1) should result in maximum L1. If the kernel is referencing shared memory the driver will set the configuration to the minimal configuration necessary to executed the kernel.

cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, 96 * 1024) will force the maximum shared memory carve out. Using <<<{gridDim}, {blockDim}, {dynamicSharedMemory}>>> the maximum size per launch is 48 * 1024 on GV100.