Difference between cudaMalloc and cudaMemAlloc?

I have read on Programming guide taht tehre is two way to allocate memory : cudaMalloc and cudaMemAlloc. What is the differences between these two function. Which function is the best?

One (cudaMalloc) is a runtime function, the other is (cuMemAlloc) an API function, but overall they do the same.

I get it but what is the difference beteween a “runtime” and an “API” function?
Do you know which one is the fastest?

Driver API is “closer” to hardware, so it should be faster (but not much faster).

Runtime API is built on Driver API. It is “high-level” API for using CUDA and is much easier to learn and use.

It’s up to developer to choose which one to use; both APIs provide similar functionality.

Driver API should be a bit faster (as it lacks some checks for CUDA context initialisation), but you should never rely on the speed of memory allocation functions. Just never use them in loops. For the best performance, just allocate the entire of GPU mem at the start of your program and go from there.