I have been having some problems with a code that generates rogue processes and gives me the wrong answer when I utilize Unified Memory. Here I have included an example code. The first code is similar to what is being done with my current code.
psi_multigpu_test_code_roughprocess.f (31.7 KB)
When I compile this for the GPU with Unified Memory and run it on 4 MPI ranks, it generates rogue processes on rank zero like so:
If I move my “!$acc set device” earlier in the code, the rogue processes do not generate. To my knowledge, there is no code that would run on the GPU between MPI_init and the “!$acc set device” with my first code. Below I have my fixed code which just has the “!$acc set device” moved.
psi_multigpu_test_code_fixed.f (31.7 KB)
When I compile this for the GPU with Unified Memory and run it on 4 MPI ranks, it generates no rogue processes on rank zero.
My question is why does my first version of the code lead to rogue processes?
Thanks,
- Miko