Hi,
I want to start a calculation, where I use multiple nodes, each node has (say) 4 GPUs. The nodes I want to connect with MPI, and within each node, it’s 4 GPUs should communicate with NVSHMEM, for which I intend to start 4 threads of the same process. Is there a way to initialize nvshmem with this setup?
(haveing 4*#nodes MPI processes and defining sub-communicators within MPI (for 4 processes whithin each node) would be a suboptimal solution)
Thanks