How to use nvshmem with torch?

I am tring to use nvshmem to create embedding which I need transport in each layer after forward.
I want to use torch.nn.Linear to forward it. It means that I need to trans nvshmem array to libtorch tensor.
I try to use torch::from_blob to load nvshmem array, but it raise errors.
Is there any way to do that?
I know I can use cublas to impl it easily, but I don’t konw how to handle autograd, especially distrubuted.
I think it will better if I can use other framework with nvshmem.

by the way, when i use 'nvshmem_finalize()‘, It raise:

/dvs/p4/build/sw/rel/gpgpu/toolkit/r11.8/main_nvshmem/src/host/init/init.cu:1051: non-zero status: 1 Invalid context pointer passed to nvshmemx_host_finalize.

/dvs/p4/build/sw/rel/gpgpu/toolkit/r11.8/main_nvshmem/src/host/init/init.cu:nvshmemx_host_finalize:1128: aborting due to error in nvshmem_finalize 

/dvs/p4/build/sw/rel/gpgpu/toolkit/r11.8/main_nvshmem/src/host/init/init.cu:1051: non-zero status: 1 Invalid context pointer passed to nvshmemx_host_finalize.

/dvs/p4/build/sw/rel/gpgpu/toolkit/r11.8/main_nvshmem/src/host/init/init.cu:nvshmemx_host_finalize:1128: aborting due to error in nvshmem_finalize