Hi,
maybe this question is more a MPI question than a CUDA related question… However…
How do you efficiently access the GPU device when more than one CPU process wants to use the device?
What I do is:
Pseudocode :)
for i = 0, MPI_NUM_RANKS
{
if(i == MY_RANK)
{
->ACCESS DEVICE
}
MPI_BARRIER(MPI_COMM)
}
While this successfully serializes the GPU access, it is not very efficient.
Maybe someone has got a better solution?