how to transfer CUDA code into MPI+CUDA

I want to transfer CUDA code into MPI+CUDA code, so that it can run on a GPU cluster. How to write a program to do it automatically? Is that possible to transfer all kinds of CUDA codes into MPI+CUDA codes correctly with one transferring program? Thanks a lot!

I am not sure what you mean by “transferring program”. The approach I take is to think of MPI as the next level up in a hierarchically arranged decomposition of execution resources: thread block, grid, MPI process. And just as with CUDA blocks and grids, one has to decide how to map the data decomposition to this decomposition of execution resources.

Thanks for your reply. I didn’t make it clear. For example, my “transferring program” must use different ways to map the data decomposition for the arrays addition and the matrix multiplication. Your method is the core idea for my problem and can achieve my goal, but is there a all-purpose program to map the data decomposition to this of execution resources automatically and correctly? (this all-purpose program is “transferring program” that I mean) Guide me some steps to write that program, thanks a lot:-)

I am not aware of a program that applies the decomposition at MPI-process level automatically for the programmer. You apply the decomposition manually in CUDA as well (unless you use pre-packaged libraries) so just follow the same process you have been using for that to extend to MPI level.