How can I use OpenMP to do the multi-gpu computing (applying CUDA) in a share memory processing system? Plus, please kindly offer a simple example. It will be much more easy to understand. Thanks a lot.
For multi-GPU programing using CUDA Fortran I generally recommend using MPI rather than OpenMP. Managing the data and GPU contexts in OpenMP can be tricky to get correct given OpenMP assumes all threads share the same memory system. Since a GPU’s memory is discrete, you need to manage this yourself. Domain decomposition across discrete memory system is a natural part of MPI programming making it much simpler.
While this article is a bit old, the concept should still be relevant: http://www.pgroup.com/lit/articles/insider/v3n3a2.htm. The only thing missing is using a GPUDirect-aware MPI which was just starting to be discussed when I wrote the article. I do have an OpenMP version of this code I can send you, if you can’t use MPI for some reason.
Thanks for your help. I do realize the parallel performance of MPI is better than OpenMP for the complex cases. However, I really prefer to use OpenMP. Could you please kindly send me the OpenMP version of this example? It will be helpful. Thanks again.