If you have 8 GPUs on your one platform and wish to use them all
simultaneously, the usual method is to run an OpenMP parallel section
in 8 threads on the CPU, where each thread assigns a different GPU, and then runs the GPU code on the assigned element. You can sync all the work at the end of the OpenMP section.
pgaccelinfo
will tell you what the compilers can see (8 GPUs?), to make sure
the compilers can access them.
A multi-process MPI program has to know which GPUs are available,
or it may end up just waiting for processes to end.
The GPUs do not do multi-tasking, they only run on job at a time. I am not sure overloading processes on the same platform to access individual GPUs will be successful.