Example on MPI + CUDA on Two CPU and Two GPU node

Hi,

Can someone help me write a simple MPI program that executes 1 process per CPU(Assuming two CPUs in the node and two GPUs, and we have single node only) , and each of those processes calls a CUDA kernel (K)? Any pointer to example code/literature will be of real help.

Thanks for your help in advance

K

Perhaps it might be easier if you post the code you already have written and explain what you can’t get to work or do not understand.