What happens when a cudaMalloc is called

I would like to understand what exactly happens when a cudaMalloc function is called.

float *d_A;
cudaMalloc(d_A, 10 * sizeof(float));

Looking from host perspective, at line 1, d_A is just a floating pointer containing NULL value. At line 2, a memory is allocated on device memory as cudaMalloc is called.

What happens inside line 2 i.e. what happens between calling cudaMalloc function and allocating memory on device memory. Where is cudaMalloc executed(Host or device)? How is memory management on device memory taken care of?

After line 2, how does the host treat the d_A pointer as? Is it just another pointer for host or will it treat d_A as some special pointer?

Thanks!!!
Varun

cudaMalloc is a library function call - in the cuda runtime library, provided by libcudart.so on linux (or libcudart.a if statically linked)

the library function call is mostly (i.e. all) host code, but it is host code that is interacting with the GPU driver

after line 2, the host (i.e. host code) doesn’t know anything different about the d_A pointer. It is still just an ordinary C (or C++) pointer. It just happens to have a non-null numerical value now, after the function call (unless an error occurred).