Potential NVSHMEM allocated memory performance issue

Could you please try setting the environment variable NVSHMEM_DISABLE_CUDA_VMM=1? This disables automatic symmetric heap sizing, so you may also need to increase NVSHMEM_SYMMETRIC_SIZE. More information on these env vars is available here: Environment Variables — NVSHMEM 2.10.1 documentation