What's actually behind cudaMalloc When it is called inside a kernel?

I want to know how different it is when you call cudaMalloc inside a kernel compare to you calling malloc? I know malloc works as it allocates the memory on heap and all that but how does cudaMalloc work? I heard that it is just a wrap around malloc is that true?

cross posted:


maybe you should read the documentation:


“When invoked from the device runtime these functions map to device-side malloc() and free().”