How the shared memory work with cudaFuncSetAttribute?

Hey, I’m programming with CUDA.

I noticed that I can change the size of shared memory on specific kernels (1. Introduction — CUDA C++ Programming Guide), I wonder

  • If a kernel defines a local array using 16KB shared memory, but before launch it, the shared memory size is set to 8KB. What will happen? Will the left 8KB data be allocated to global memory, shared memory (means the cudaFuncSetAttribute will not work), or registers?

  • On GA100 GPU, shared memory + L1 Cache is 192KB per SM. If I set shared memory to X KB, will the remaining storage used as Cache, or the Cache is actually a fixed size?

Thanks!

From the Programming Guide:

" Shared memory is partitioned out of unified data cache, and can be configured to various sizes (See Shared Memory.) The remaining data cache serves as an L1 cache and is also used by the texture unit that implements the various addressing and data filtering modes mentioned in Texture and Surface Memory."

If you set the shared memory size to less than is needed by a kernel, the kernel probably will fail to start. (Or setting the shared memory size of that kernel will fail.)