memory leak in cuda-Aware MPI fortran program


I am working with a cuda-aware MPI fortran program. In this program, I have three subroutines which will cycle ten thousands times. In each of them, I allocate and deallocate memory. And I also transfer data through different nodes using MPI_SENDRECV.

I use two tesla P100 in ubuntu. That are the background.

There is a memory leak in my program. I am sure I deallocate every dynamic request for memory. But when I use the cudaMemGetinfo, I find some deallocate order is invalid, that is I deallocate 10 arrays but only 9 succeed. IF I run my program in one node or when I only run one of three subroutines, there is no memory leak.

I google it, someone told me that it is beause when I use MPI_SENDRECV to transfer GPU memory directly, Unified Memory will record the memory.

By the way, I try to use cudaDeviceReset, it failed.

I donot know how to solve this bug. Do you have any suggestions?

Thank you!!


If I delete all MPI_SENDRECV which used to transfer GPU memory directly, the memory leak disappears!!

But I get the right result with MPI_SENDRECV. Why MPI_SENDRECV lead to memory leak? Is it due to my hardware?

Thank you!