Example on MPI + CUDA on Two CPU and Two GPU node


Can someone help me write a simple MPI program that executes 1 process per CPU(Assuming two CPUs in the node and two GPUs, and we have single node only) , and each of those processes calls a CUDA kernel (K)? Any pointer to example code/literature will be of real help.

Thanks for your help in advance


Perhaps it might be easier if you post the code you already have written and explain what you can’t get to work or do not understand.