What's actually behind cudaMalloc When it is called inside a kernel?

I want to know how different it is when you call cudaMalloc inside a kernel compare to you calling malloc? I know malloc works as it allocates the memory on heap and all that but how does cudaMalloc work? I heard that it is just a wrap around malloc is that true?

cross posted:

[url]c++ - Whats actually happens when you call cudaMalloc inside device? - Stack Overflow

maybe you should read the documentation:

[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-allocation-and-lifetime[/url]

“When invoked from the device runtime these functions map to device-side malloc() and free().”

[url]Programming Guide :: CUDA Toolkit Documentation