Programming multi-gpus with accelerator directives

Are there any examples available of using openmp and accelerator directives to program multiple gpus?


Hi Sarom,

Off hand, I don’t have one that I can share. I’ve done it before, but in the context of proprietary customer code. I do have a good example using MPI with the PGI Accelerator Model that I’m planning on using for my next PGinsider Article that you can have.

Personally, I find MPI much easier to work with when using Multi-GPUs. Not that you can’t use OpenMP, it’s just that OpenMP often requires more rewriting of code. Your basic outline of would be something like:

1) start an OpenMP parallel region
2) associate each thread with a particular device (the context)
3) manually divide the problem among threads. 
4) create the acceleration region for this threads segment of the problem.
5) shut down the device context
6) exit the OpenMP region

Granted, this is no different than MPI, but this is normally how MPI is programmed. With OpenMP most users just let the compiler figure things out, but with the compiler doesn’t yet have the ability to automatically manage multiple discrete memories. On an SMP system, there’s only the one memory system.

  • Mat

Hi Mat,

I’m willing to give MPI a try. May I have access to this PGInsider Article?


May I have access to this PGInsider Article?

Well, I haven’t written the article yet, but I’ll send you the code via email. It’s the source from my SEISMIC_CPML presentations I gave at SC11.

  • Mat

Thanks Mat,

I managed to get a multi-gpu code working using OpenMP.

It took a while to figure out that I can’t compile using the ‘time’ suboption in the target accelerator flag.

A wishlist item may be an additional suboption to pin the timing to a specific device number?

Hi Sarom,

I talked with out tools manager about this. Instead of a sub-option, we are looking into profiling of multiple GPUs with OpenMP. Though, right now the “time” profiling is single threaded. I added TPR#18400 to track your request.

In the meantime, what you can do, is set the environment variable “CUDA_PROFILE” to 1 and “CUDA_PROFILE_LOG” to “cuda_profile.log.%d”. This will create multiple CUDA Profile logs, one for each OpenMP thread.