I’am experimenting a strange behaviour of my Fortran-MPI-GPU program and I
would like have some advice about it.
Schematically the program is a big loop where the most time-consuming computation is done
on CPU or GPU depending on the processus (proc).
The program splits the computation in many parallel MPI processus (proc).
Any proc control if a GPU device is present and with a little routine I can assign to it
the correspondent GPU if present.
To do that I use CudaSetDevice() and the GPU are set free at the end of the program.
The program was tested with many GPU on multi-core and/ore multi-nodes platforms and it works fine.
Now I wrote a kernel to use the GPU also in the second most consuming-time routine.
Due to a mistake, I have left any proc call the new kernel (also where the GPU was not assigned).
I was astonished that all worked fine.
I aspected that, because I set the device, the other procs give me an error, but no:
any proc called the same GPU en execution the kernel without problem and with good result.
Have you an explaination?