New operator in device functions

Can anyone point me to documentation on what happens behind the scenes when you use the new and delete operators within a kernel function?

I understand from discussion forums that new will allocate memory on a per thread heap, rather than doing a cudaMalloc, but I have not seen any formal documentation from nvidia on what the new operator is doing. I would like to understand what cuda api are being called to allocate memory by the new operator from the device side.


The behavior is identical to in-kernel malloc and free, which are documented in the programming guide.