I am trying to use more than 64KB of shared memory. I can do so with cudaFuncSetAttribute() and set the cudaFuncAttributeMaxDynamicSharedMemorySize. It works well when I launch my kernel from host. However, when I am trying to launch the kernel using dynamic parallelism, the cudaFuncSetAttribute() doesn’t seem to work. Is anybody here had experience similar problem?
A similar request was made here. The OP there filed a bug with NVIDIA (3503453), and that bug is in the late stages of being finalized. The development work is done, and based on my read of the bug, it should become available in CUDA 12.1 (if it is not already available in 12.0. I haven’t tested 12.0, but based on what I see in the bug it doesn’t appear to be in 12.0). There is no guarantee that it will be in 12.1, that is just my best guess based on what I see so far.
Thanks Robert, good to know that it is actually not working at the moment.