This was done by Ian Buck and Stephen Jones at NVIDIA, so it seems to me at least like the writing is on the wall for this to appear in some newer version of CUDA.
If you are impatient you could probably implement it yourself.
It certainly needs to appear in a future CUDA release to make good on the promised full support for C++ new and delete in the Fermi white paper. (Neat that they did this on the C1060 though! Would like to see such things on compute capability < 2.0.)