Host Device Code Interation

Greetings all,
(I hope this is the right section to post).

I was wondering about how the interaction between the CPU and the GPU works. Does anyone know the details of how it works?

I’ve been looking at the intermediate nvcc files (by setting the flag “–keep”), and the cudaLaunch() function caught my attention. I’m guessing that the host basically passes the driver a pointer to the code in memory. It’d appreciate if it someone could verify this. Or point me to some resources.

Laslty, does anyone know how cudaMalloc() and cudaFree() work under the hood?

Thanks for any assistance.
Zack