How is cudaMalloc implemented?

cudaMalloc is called in the host code rather than in the device code. Could you please explain the procedure of memory allocation in device memory? tks.