hi, I have two cpus and two gpus, every gpu’s pci-e slot related to each cpu, when I use cudaMemcpy copy the data from host memory to device, How to avoid the gpu using the memory releated to another cpu. In this situation, the performace is low.
pin the host process to the CPU that is “closest” to the GPU you want to use.
on linux, two common utilities to do this are numactl and taskset
Both should have man pages on your system that you can read about. I find taskset a little easier to use for quick activities.
It will be important to understand the logical-core to physical-socket mapping. But since you already know there is a perf difference, you can figure it out yourself experimentally.
thank you.
I’m working on windows10.
can I use the NUMA(Non Uniform Memory Access Architecture)API to set the memory location?
or I set the host process to the CPU that is “closest” to the GPU. and Other things left to the operating system.
Thank you.
Actually, I have only one process and there are two threads in it. one thread control one gpu. In this situation, I set the thread affinity as you described. but how to guarantee the memory is allocated closest to the cpu? it depends on the operating system? or can I do it by programming?