Hi,
Can someone help me write a simple MPI program that executes 1 process per CPU(Assuming two CPUs in the node and two GPUs, and we have single node only) , and each of those processes calls a CUDA kernel (K)? Any pointer to example code/literature will be of real help.
Thanks for your help in advance
K