Using NVSHMEM in Building Pytorch Operator

Hi, All

I want to build a Pytorch operator by using NVSHMEM.
Is there any way that I can do that? Because when we build a standalone NVSHMEM application written in pure C++ and CUDA C, we need to use nvshmrun -n 2 to run that application. While in Pytorch, how could we achieve the same goal?

Thanks!

Hi Daniel,

Would you mind posting your question over on the GPU Libraries forum? GPU-Accelerated Libraries - NVIDIA Developer Forums

This forum is for the NV HPC Compilers so I’m not sure you’ll get any responses here.

Thanks,
Mat

Sure, Thanks for your suggestion!