Hi, All
I want to build a Pytorch operator by using NVSHMEM.
Is there any way that I can do that? Because when we build a standalone NVSHMEM application written in pure C++ and CUDA C, we need to use nvshmrun -n 2
to run that application. While in Pytorch, how could we achieve the same goal?
Thanks!