Hello,
I’am experimenting a strange behaviour of my Fortran-MPI-GPU program and I
would like have some advice about it.
Schematically the program is a big loop where the most time-consuming computation is done
on CPU or GPU depending on the processus (proc).
The program splits the computation in many parallel MPI processus (proc).
Any proc control if a GPU device is present and with a little routine I can assign to it
the correspondent GPU if present.
To do that I use CudaSetDevice() and the GPU are set free at the end of the program.
The program was tested with many GPU on multi-core and/ore multi-nodes platforms and it works fine.
Now I wrote a kernel to use the GPU also in the second most consuming-time routine.
Due to a mistake, I have left any proc call the new kernel (also where the GPU was not assigned).
I was astonished that all worked fine.
I aspected that, because I set the device, the other procs give me an error, but no:
any proc called the same GPU en execution the kernel without problem and with good result.
If you do not set the device explicitly it will use device 0, or the first CUDA capable device present.
You said earlier that the device wasn’t being selected. If you don’t set the device, your application doesn’t use the CPU? wouldn’t that be the reason because it is working?
It is because if you don’t set the device, it will run on the device 0 or the first cuda capable one present. It seems that you have 4 process on the same machine. In the second iteration, processesses 2,3 and 4 will use the device without pro problem. The problem that may occur is that on the second iteration, processes 2,3 and 4 will not have the necessary data allocated on device, but this will depend on the kind of problem you’re solving.