Hey, I’m programming with CUDA.
I noticed that I can change the size of shared memory on specific kernels (1. Introduction — CUDA C++ Programming Guide), I wonder
-
If a kernel defines a local array using 16KB shared memory, but before launch it, the shared memory size is set to 8KB. What will happen? Will the left 8KB data be allocated to global memory, shared memory (means the cudaFuncSetAttribute will not work), or registers?
-
On GA100 GPU, shared memory + L1 Cache is 192KB per SM. If I set shared memory to X KB, will the remaining storage used as Cache, or the Cache is actually a fixed size?
Thanks!