How to use two GPUs efficiently

hi, I have two cpus and two gpus, every gpu’s pci-e slot related to each cpu, when I use cudaMemcpy copy the data from host memory to device, How to avoid the gpu using the memory releated to another cpu. In this situation, the performace is low.

thank you.

pin the host process to the CPU that is “closest” to the GPU you want to use.

on linux, two common utilities to do this are numactl and taskset

Both should have man pages on your system that you can read about. I find taskset a little easier to use for quick activities.

It will be important to understand the logical-core to physical-socket mapping. But since you already know there is a perf difference, you can figure it out yourself experimentally.

[url]cudaMemcpyDeviceToHost - slow performance using pinned memory - CUDA Programming and Performance - NVIDIA Developer Forums

The general topic to google on is “process affinity”, it’s not unique or specific to CUDA:

[url]https://www.glennklockwood.com/hpc-howtos/process-affinity.html[/url]

thank you.
I’m working on windows10.
can I use the NUMA(Non Uniform Memory Access Architecture)API to set the memory location?
or I set the host process to the CPU that is “closest” to the GPU. and Other things left to the operating system.

May I have your suggestion?

Thank you.

I’m sure there are many ways to do it. A method analogous to taskset on linux would be start /affinity on windows.

[url]https://blogs.msdn.microsoft.com/santhoshonline/2011/11/24/how-to-launch-a-process-with-cpu-affinity-set/[/url]

I don’t do these sorts of things on windows very much.

Thank you.
Actually, I have only one process and there are two threads in it. one thread control one gpu. In this situation, I set the thread affinity as you described. but how to guarantee the memory is allocated closest to the cpu? it depends on the operating system? or can I do it by programming?