I want to build a Pytorch operator by using NVSHMEM.
Is there any way that I can do that? Because when we build a standalone NVSHMEM application written in pure C++ and CUDA C, we need to use
nvshmrun -n 2 to run that application. While in Pytorch, how could we achieve the same goal?