cudaMallocHost and cudaHostAlloc differences and usage

colede · February 15, 2011, 5:11pm

I have noticed that int he API, there are two functions - cudaHostAlloc and cudaMallocHost. Can someone explain the difference between these two?

Also - I have noticed that there are some threads talking about usage, and having quite a bit of overhead using these methods. I understand this can be amortized to be reduced by reusing the buffers.

What kind of overhead hit can this actually take? If I have say 3 images that I want to get to the GPU, which are quite large (70mb each), would it be best to use cudaMallocHost for 3 buffers, and read the images into those buffers directly? Presently, I am reading into a normal malloc() array, and then copying to a device array.

Last question - if I was going to use a single array cudaMallocHost, and I copied each image into it individually, is there a way to move the image to another array once it is on the device? This way, I can instantiate one instance of cudaMallocHost, and transfer all three images to device, and have them all existing concurrently on the device.

Thanks

Derek

tmurray · February 15, 2011, 6:26pm

The difference between the two if you’re compiling with nvcc: nothing. The only difference is if you’re using the runtime API from a C program rather than a C++ program (nvcc always compiles as C++), and then cudaHostAlloc is how you specify flags for your allocation.

colede · February 15, 2011, 11:04pm

Thank you - I was able to figure out how to get it to work. I just wasnt sure which one to use, but the rest of it worked as described

DrAnderson42 · February 16, 2011, 12:53pm

From C++, cudaHostAlloc has one nice feature that cudaMallocHost doesn’t - at least in CUDA 3.0, they are identical in 3.2 (which matters if you are programming a widely used app and need compatibility). In 3.0, cudaHostAlloc will take in a ** of any type and use template magic to cast it to a void** for you, whereas cudaMallocHost requires you to do the void** cast yourself (which is annoyingly long if you want to compile with -Wall and want no warnings).

If you care about supporting really old versions of CUDA, cudaMallocHost is the only method that exists in them.

Topic		Replies	Views
CudaMallocHost() vs CudaHostAlloc() How difference? CUDA Programming and Performance	1	20012	May 12, 2009
cudaMallocHost() vs cudaHostAlloc(cudaHostAllocPortable) CUDA Programming and Performance	1	4758	August 22, 2013
Differences CudaMalloc vs Malloc CUDA Programming and Performance	6	5785	May 16, 2023
Why is cudaMallocHost() so slow? CUDA Programming and Performance	7	8772	November 17, 2021
Low performance for CPU accessing page-locked memory? CUDA Programming and Performance	3	597	March 7, 2019
cudaMalloc() vs Malloc() in pure C CUDA Programming and Performance	5	125	September 18, 2024
cudaMallocHost confusion CUDA Programming and Performance	6	9784	June 24, 2011
Difference between cudaMallocManaged and cudaMallocHost CUDA Programming and Performance cuda	3	10620	March 30, 2022
Is cudaHostAlloc() fast? CUDA Programming and Performance	5	409	March 28, 2024
Simple cudaMallocHost beginner question CUDA Programming and Performance	5	2698	September 29, 2008

cudaMallocHost and cudaHostAlloc differences and usage

Related topics