WANTED: cuPrintf that uses local memory instead of registers is that possible?

Hi,

My kernels are often under register pressure and adding a cuPrintf in there crosses the threshold so the kernels won’t launch anymore.

Could nVidia provide a cuPrintf that uses local memory instead of registers and thus does not add extra registers to a kernel? Would
this even be remotely possible, given the current state of the compiler?

Also what’s holding you back from making cuPrintf entirely public and accessible for everyone?

Christian

Can’t you just set a -maxregcount parameter? Every excessive registers will be spilled to local memory automatically. Won’t be fast but will work…

Isn’t everything in a register before it gets written to global/local or host memory? As far as I know, only shared memory can be directly used.