Data copy between multi-GPUs


I have a question about data copy between multi-GPUs.

I implemented some program with CUDA.

Now I am trying to implement the program using multi-GPUs to improve performance.

I didn’t start implemtenting with multi-GPUs yet. So I don’t know well.

I was wondering about how to copy data between devices directly.

I am going to use two GPU devices.

My program that I am trying to implement has to share data between devices during process.

I want to know how to copy data device to the other device directly in the middle of process.

The ‘process’ means process in host, not process in kernel.

Is that possible?

If so, is there any example source code?

please let me know how to do it.

Thanks in advance.

Great question.

I don’t think there’s a direct way? There should be. The PCIe bus could handle this much better than doing a memcpy into host RAM and a 2nd memcpy into the 2nd device. But I’m pretty sure that’s what you’ll have to do.

Follow up question on my part:

Can a piece of memory allocated with cudaMallocHost be accessed from two different threads (each running different CUDA contexts)?