There are 2 CPUs and 8 GPUs, among which CPU1 is bound to gpu1-4 and CPU2 is bound to gpu5-8. How to copy data from gpu8 to gpu1?
One possible approach:
$ cat t43.cu
int main(){
float *d1, *d2;
cudaSetDevice(0); // or another device
cudaMalloc(&d1, 128);
cudaSetDevice(2); // or another device
cudaMalloc(&d2, 128);
cudaMemcpy(d2, d1, 128, cudaMemcpyDeviceToDevice);
cudaDeviceSynchronize();
}
$ nvcc -o t43 t43.cu
$ cuda-memcheck ./t43
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
$ nvprof ./t43
==14495== NVPROF is profiling process 14495, command: ./t43
==14495== Profiling application: ./t43
==14495== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 53.96% 3.2640us 1 3.2640us 3.2640us 3.2640us [CUDA memcpy DtoH]
46.04% 2.7850us 1 2.7850us 2.7850us 2.7850us [CUDA memcpy HtoD]
API calls: 98.48% 542.04ms 2 271.02ms 179.75ms 362.29ms cudaMalloc
0.91% 5.0097ms 4 1.2524ms 590.58us 3.2132ms cuDeviceTotalMem
0.45% 2.5001ms 404 6.1880us 355ns 280.91us cuDeviceGetAttribute
0.07% 391.83us 1 391.83us 391.83us 391.83us cudaDeviceSynchronize
0.06% 332.39us 4 83.098us 59.127us 150.61us cuDeviceGetName
0.01% 68.397us 1 68.397us 68.397us 68.397us cudaMemcpy
0.00% 22.333us 2 11.166us 2.7670us 19.566us cudaSetDevice
0.00% 18.089us 4 4.5220us 3.0330us 7.4510us cuDeviceGetPCIBusId
0.00% 6.5020us 8 812ns 460ns 1.5000us cuDeviceGet
0.00% 5.6310us 3 1.8770us 565ns 3.7430us cuDeviceGetCount
0.00% 3.2840us 4 821ns 597ns 1.1370us cuDeviceGetUuid
$
If the GPUs have a direct NVLink connection between them, you could improve on this with cudaMemcpyPeerAsync (refer to cuda sample codes such as simpleP2P). If the GPUs have an NVLink fabric connecting them but no direct connection, another option would be to use NCCL point-to-point communication.
Thank you for your reply, but in my use case, gpu1 and gpu8 are bound with different CPUs. Can I copy them like this? GPU are 3090s, no NVLinks.
Why not give it a quick try? Should be doable in under a minute …
I haven’t bought my new machine yet. Now it’s single CPU and multi GPU