Multi MPI ranks share one GPU

Hi,

maybe this question is more a MPI question than a CUDA related question… However…

How do you efficiently access the GPU device when more than one CPU process wants to use the device?

What I do is:

Pseudocode :)

for i = 0, MPI_NUM_RANKS
{
if(i == MY_RANK)
  {
   ->ACCESS DEVICE
   }

MPI_BARRIER(MPI_COMM)
}

While this successfully serializes the GPU access, it is not very efficient.

Maybe someone has got a better solution?

Use CUDA MPS:

Ah thank you for the link.

Unfortunately i am bound to Fermi Architecture…

There is now way to use Hyper Q on fermi?

No, it requires cc 3.5 or newer.