New operator in device functions

Can anyone point me to documentation on what happens behind the scenes when you use the new and delete operators within a kernel function?

I understand from discussion forums that new will allocate memory on a per thread heap, rather than doing a cudaMalloc, but I have not seen any formal documentation from nvidia on what the new operator is doing. I would like to understand what cuda api are being called to allocate memory by the new operator from the device side.

Thanks!

The behavior is identical to in-kernel malloc and free, which are documented in the programming guide.

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations