I want to know how different it is when you call cudaMalloc inside a kernel compare to you calling malloc? I know malloc works as it allocates the memory on heap and all that but how does cudaMalloc work? I heard that it is just a wrap around malloc is that true?
cross posted:
[url]c++ - Whats actually happens when you call cudaMalloc inside device? - Stack Overflow
maybe you should read the documentation:
[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-allocation-and-lifetime[/url]
“When invoked from the device runtime these functions map to device-side malloc() and free().”