Using NVSHMEM on a Python Library

I currently provide a CUDA library for a client in a similar model as cupy. I was wondering if it is possible to launch this code over nvshmem and how would be the best method do it. Ideally I would like to wrap the nvshmem call of a C++ shared library using Cython but I am just wondering if anyone tried to do it.

I haven’t tried it yet but this use case was one of the major motivations for GitHub - mpi4py/shmem4py / https://joss.theoj.org/papers/10.21105/joss.05444 / shmem4py: High-Performance One-Sided Communication for Python Applications | Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis. Someone just needs to write the analogous code to support NVSHMEM instead of OpenSHMEM.