On the implementation of CUDA_MALLOC

As we know, we use CUDA_MALLOC to allocate a certain memory in video memory. Would you please discuss it implementation? As we know, `malloc’ in C is used to allocate spaces in system memory, which is accomplished by calling corresponding system calls. The system calls generally vary between operating systems because of different virtual memory and memory allocation mechanism. So, what interests me is whether CUDA_MALLOC is as complex as ‘malloc’, involving virtual addressing and anti-fragmentation algorithm such as a buddy system. Thanks.