Can anyone point me to documentation on what happens behind the scenes when you use the new and delete operators within a kernel function?
I understand from discussion forums that new will allocate memory on a per thread heap, rather than doing a cudaMalloc, but I have not seen any formal documentation from nvidia on what the new operator is doing. I would like to understand what cuda api are being called to allocate memory by the new operator from the device side.
Thanks!