Making a code in CUDA, I needed a variable shared memory size. I believed I couldn’t do that, but the function cudaConfigureCall then dazzled me.
I don’t quite know how to work with that function, specifically how to setup the shared memory size.
So my question is, can I use cudaConfigureCall to call a kernel with a variable-depending shared memory size, instead of a constant shared memory size?
If you need to know the size of the shared memory in the kernel you can pass it in as an argument, but in many cases the size you need is a natural consequence of the number of threads in a block, so you don’t need to pass it in.