What's actually behind cudaMalloc When it is called inside a kernel?

I want to know how different it is when you call cudaMalloc inside a kernel compare to you calling malloc? I know malloc works as it allocates the memory on heap and all that but how does cudaMalloc work? I heard that it is just a wrap around malloc is that true?

cross posted:

http://stackoverflow.com/questions/37383350/whats-actually-happens-when-you-call-cudamalloc-inside-device

maybe you should read the documentation:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-allocation-and-lifetime

“When invoked from the device runtime these functions map to device-side malloc() and free().”

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations