About cuda aware mpi

Chen_Yang · October 31, 2021, 11:20pm

Dear experts,
I use mpi_allreduce with device buffer. But profiling results show that the time used by this function increases with simulation. However, it should not be so.

real(8),allocatable,device:: PET_balance_de(:)
allocate(PET_balance_de(2))
call MPI_ALLReduce(MPI_IN_PLACE,PET_balance_de,2,MPI_DOUBLE_PRECISION,
MPI_SUM,MPI_COMM_WORLD,ierr)

PET_balance_de is an two elements array independent on the timestep of the code, so it should not increase with simulation time. However, you can see

It is weird. Could you please give some ideas?
Thanks much in advance!
Chen

Topic		Replies	Views
GPU memory usage high but per process showing low memory usage nvc, nvc++ and nvfortran	3	954	May 31, 2023
Ways to reduce GPU memory usage Legacy PGI Compilers	1	2050	February 8, 2016
CUDA aware MPI CUDA Programming and Performance	0	1087	June 26, 2013
Mpi multi-card communication statement CUDA Programming and Performance	1	249	March 12, 2023
MultiGPU do not accelerate program CUDA Programming and Performance	1	419	September 28, 2018
Thrust: Out of memory for large array CUDA Programming and Performance	12	6750	May 3, 2012
MPI-CUDA-Runtime-comapre CUDA Programming and Performance	3	1294	April 18, 2015
Memory increase in GPU-aware non-blocking MPI communications CUDA Programming and Performance	5	330	October 8, 2024
Device memory latency and lookup tables - please advise CUDA Programming and Performance	26	6942	February 28, 2009
cuda program running speed is very slow CUDA Programming and Performance	5	1361	December 22, 2016

About cuda aware mpi

Related topics