My kernels are often under register pressure and adding a cuPrintf in there crosses the threshold so the kernels won’t launch anymore.
Could nVidia provide a cuPrintf that uses local memory instead of registers and thus does not add extra registers to a kernel? Would
this even be remotely possible, given the current state of the compiler?
Also what’s holding you back from making cuPrintf entirely public and accessible for everyone?