implicit or explicit use of shared memory Which is better

I’m looking to optimize some computer vision code I’m working on. Shared memory seems logical because of the several neighborhood operations I’m doing. Could someone explain the merits of explicitly giving thread blocks a certain amount of shared memory on execution versus not using the ‘extern’ keyword and creating them within the kernel.

Additionally, what is the scope of shared variables, whether they’re created externally or from within the kernel. This would be important in determining whether I have to split my algorithms into several kernels versus a well synchronized, but much larger kernel.


The lifetime of the shared variables is the lifetime of a block.

the benefit from giving the amount on execution, is that you can change the amount at runtime depending on how much you need.