I’m using the V100s to run some of my applications and wanted to profile them, however I noticed that when dumping the output to a log file and then importing it into Nvidia Visual Profiler the shared memory configuration is reading:
Shared Memory Requested: 96 KiB Shared Memory Executed: 96 KiB Shared Memory Bank Size: 4B
As I understand it the V100’s come with configurable memory split between L1 and shared memory, and I know this application is not using shared memory so I’m quite eager to have it all go to L1. But I’ve already tried cudaFuncSetAttribute() and cudaFuncSetCacheConfig() with no luck, it is still reporting the same numbers above for shared memory configuration. Am I using the wrong functions to configure the device? Am I misunderstanding what Nvidia Visual Profiler is showing me?