Simple cudaMallocHost beginner question

Hello All,

I am trying to use cudaMallocHost in my program and I am having a problem with it.

If I understand it properly using cudaMallocHost I can change the data from CPU and GPU.

In my code I have a structure as following

I allocated memory as

I read some values into the structuer from a file

but I am not able to retrieve data from the structure.

Can anyone help me.


You only need to use cudaMallocHost if you’re allocated pinned memory for the host. This increases the Host <—> Device transfer speed, but it disallows any other program on the system to use that memory. In many cases, you’re probably better off using a normal malloc() call.

I am bit confussed between cudaMallocHost and using cudaMemcpyAsync.

As there are many pointers in my structures. For sure I have to do operations on both CPU and GPU.

Can I allocate memory on Host using malloc and transfer using cudaMemcpyAsync???

No, I believe that if you’re using cudaMemcpyAsync, you have to use pinned host memory allocated with cudaMallocHost(). If you’re doing a synchronous transfer, you can just use normal memory allocated with malloc().


As you can see in my structure there are pointers. When I assign something to the pointer on CPU it points to some location in Main Memory.

What is the efficient way of transfering those pointers and data onto GPU.

You can’t transfer pointers from the host to the device…host pointers point to host memory, device pointers point to device memory.

You need to allocate device memory according to the size of your structs, copy the data over from the host, generate a device pointer to the struct data in device memory, and go from there. When your computations are done, you copy the data back to the host and generate a new pointer on the host which points to the data in host memory.