I want to know how different it is when you call cudaMalloc inside a kernel compare to you calling malloc? I know malloc works as it allocates the memory on heap and all that but how does cudaMalloc work? I heard that it is just a wrap around malloc is that true?
maybe you should read the documentation:
“When invoked from the device runtime these functions map to device-side malloc() and free().”