In the multi-processor and multi-GPU scenario, how to exchange data between GPUs?

cqiao0 · December 31, 2020, 2:07am

There are 2 CPUs and 8 GPUs, among which CPU1 is bound to gpu1-4 and CPU2 is bound to gpu5-8. How to copy data from gpu8 to gpu1?

Robert_Crovella · December 31, 2020, 2:17am

One possible approach:

$ cat t43.cu

int main(){

  float *d1, *d2;
  cudaSetDevice(0);  // or another device
  cudaMalloc(&d1, 128);
  cudaSetDevice(2); // or another device
  cudaMalloc(&d2, 128);
  cudaMemcpy(d2, d1, 128, cudaMemcpyDeviceToDevice);
  cudaDeviceSynchronize();
}
$ nvcc -o t43 t43.cu
$ cuda-memcheck ./t43
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
$ nvprof ./t43
==14495== NVPROF is profiling process 14495, command: ./t43
==14495== Profiling application: ./t43
==14495== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   53.96%  3.2640us         1  3.2640us  3.2640us  3.2640us  [CUDA memcpy DtoH]
                   46.04%  2.7850us         1  2.7850us  2.7850us  2.7850us  [CUDA memcpy HtoD]
      API calls:   98.48%  542.04ms         2  271.02ms  179.75ms  362.29ms  cudaMalloc
                    0.91%  5.0097ms         4  1.2524ms  590.58us  3.2132ms  cuDeviceTotalMem
                    0.45%  2.5001ms       404  6.1880us     355ns  280.91us  cuDeviceGetAttribute
                    0.07%  391.83us         1  391.83us  391.83us  391.83us  cudaDeviceSynchronize
                    0.06%  332.39us         4  83.098us  59.127us  150.61us  cuDeviceGetName
                    0.01%  68.397us         1  68.397us  68.397us  68.397us  cudaMemcpy
                    0.00%  22.333us         2  11.166us  2.7670us  19.566us  cudaSetDevice
                    0.00%  18.089us         4  4.5220us  3.0330us  7.4510us  cuDeviceGetPCIBusId
                    0.00%  6.5020us         8     812ns     460ns  1.5000us  cuDeviceGet
                    0.00%  5.6310us         3  1.8770us     565ns  3.7430us  cuDeviceGetCount
                    0.00%  3.2840us         4     821ns     597ns  1.1370us  cuDeviceGetUuid
$

If the GPUs have a direct NVLink connection between them, you could improve on this with cudaMemcpyPeerAsync (refer to cuda sample codes such as simpleP2P). If the GPUs have an NVLink fabric connecting them but no direct connection, another option would be to use NCCL point-to-point communication.

cqiao0 · December 31, 2020, 3:15am

Thank you for your reply, but in my use case, gpu1 and gpu8 are bound with different CPUs. Can I copy them like this? GPU are 3090s, no NVLinks.

njuffa · December 31, 2020, 7:18am

Why not give it a quick try? Should be doable in under a minute …

cqiao0 · December 31, 2020, 7:37am

I haven’t bought my new machine yet. Now it’s single CPU and multi GPU

Topic		Replies	Views
Data copy between multi-GPUs CUDA Programming and Performance	2	1558	October 14, 2008
How can I copy data from another GPU in a kernel? CUDA Programming and Performance	1	564	February 9, 2021
cudaMemcpyDeviceToDevice CUDA Programming and Performance	8	6967	November 13, 2020
A little help with Multi-GPU example please :) How do I pass data to each GPU? CUDA Programming and Performance	8	28002	March 4, 2012
CudaMemcpyDeviceToDevice from one GPU to another CUDA Programming and Performance	2	8341	March 25, 2009
Copying from GPU0 to GPU1 is there a way to do it without a host? CUDA Programming and Performance	1	2186	February 15, 2010
Data transfer between GPU of a workstation CUDA Programming and Performance	2	289	April 16, 2024
Request for Suggestions on Optimizing CPU-GPU Data Transfer CUDA Programming and Performance	1	26	January 9, 2025
how to share data between two GPU? CUDA Programming and Performance	3	1832	July 11, 2009
memory copy overlap CUDA Programming and Performance	7	14709	March 29, 2008

In the multi-processor and multi-GPU scenario, how to exchange data between GPUs?

Related topics