I’m currently moving from the Runtime API to the Driver API. Before the kernel is launched with cuLaunchGrid, the shared-memory-size has to be set with cuFuncSetSharedSize. When compiling the desired function with nvcc, the ptx-stage tells me how many bytes of shared memory it allocates (due to register overflow). However this value differs for different platforms like SM_10 or SM_13 so I can’t hardcode it.
Is there a way to determine the size of shared memory a function requires when getting compiled with the jit compiler through cuLoadModule(Data) ?