Memory increase in GPU-aware non-blocking MPI communications

l.bellentani · October 8, 2024, 3:26pm

Hello Timothee,

I have started using export UCX_TLS=^cuda_ipc when more than one node is used, as long as time spent in intra-node communications is not a bottleneck with respect to time spent in inter-node. However this is probably not the best thing to do, I hope nvidia support can providea better reply :)

Btw you can also try binding against hpcx, I have seen better performances without all these exports

Let me know it this helps?

See you,

Laura

Topic		Replies	Views
CUDA/MPI interoperability problem CUDA Programming and Performance	3	2110	December 20, 2013
CUDA+MPI error on workstation Legacy PGI Compilers	4	3852	December 21, 2012
fine control of memory pinning in CUDA CUDA Programming and Performance	12	16708	May 1, 2008
interprocess communication on single GPU ? CUDA Programming and Performance	1	2340	June 22, 2012
An Introduction to CUDA-Aware MPI Technical Blog	5	1039	August 30, 2019
Share GPU/host pinned memory between host processes CUDA Programming and Performance	5	4108	March 7, 2012
CUDA aware MPI CUDA Programming and Performance	0	1104	June 26, 2013
CUDA+MPI = Unexplained Issues... Random Crashes, Errenous Output?!? CUDA Programming and Performance	5	3311	July 7, 2008
memory leak in cuda-Aware MPI fortran program Legacy PGI Compilers	1	1853	April 26, 2018
analysis of memory usage on GPU Legacy PGI Compilers	4	5793	March 15, 2016

Memory increase in GPU-aware non-blocking MPI communications

Related topics