cudaMallocHost and cudaHostAlloc differences and usage

I have noticed that int he API, there are two functions - cudaHostAlloc and cudaMallocHost. Can someone explain the difference between these two?

Also - I have noticed that there are some threads talking about usage, and having quite a bit of overhead using these methods. I understand this can be amortized to be reduced by reusing the buffers.

What kind of overhead hit can this actually take? If I have say 3 images that I want to get to the GPU, which are quite large (70mb each), would it be best to use cudaMallocHost for 3 buffers, and read the images into those buffers directly? Presently, I am reading into a normal malloc() array, and then copying to a device array.

Last question - if I was going to use a single array cudaMallocHost, and I copied each image into it individually, is there a way to move the image to another array once it is on the device? This way, I can instantiate one instance of cudaMallocHost, and transfer all three images to device, and have them all existing concurrently on the device.

Thanks

Derek

The difference between the two if you’re compiling with nvcc: nothing. The only difference is if you’re using the runtime API from a C program rather than a C++ program (nvcc always compiles as C++), and then cudaHostAlloc is how you specify flags for your allocation.

Thank you - I was able to figure out how to get it to work. I just wasnt sure which one to use, but the rest of it worked as described

From C++, cudaHostAlloc has one nice feature that cudaMallocHost doesn’t - at least in CUDA 3.0, they are identical in 3.2 (which matters if you are programming a widely used app and need compatibility). In 3.0, cudaHostAlloc will take in a ** of any type and use template magic to cast it to a void** for you, whereas cudaMallocHost requires you to do the void** cast yourself (which is annoyingly long if you want to compile with -Wall and want no warnings).

If you care about supporting really old versions of CUDA, cudaMallocHost is the only method that exists in them.