Here I use nvshmem_python-source-0.1.0.36132199_cuda12-archive. But I found that with this implementation, Buffer allocated by NvshmemResource will not be released as buffer gets out of lifetime, as Buffer is kept by NvshmemResource.
My question is: is this by design? I have to release the nvshmem Tensor manually, which is very not pythonic.
this is a sample code to verify this (copy of nvshmem_python-source-0.1.0.36132199_cuda12-archive/examples/torch_triton_interop.py):
if __name__ == '__main__':
torchrun_uid_init()
"""
Allocate 3 tensors on the NVSHMEM symmetric heap
We will add tensor1 to tensor2, and store that to tensor_out
Then, we will use nvshmem.core to sum-reduce all PEs' copies of tensor_out
"""
n_elements = 867530
nvshmem.core.utils._configure_logging(level="DEBUG")
for n in range(10):
print(f"iter {n}", flush=True)
tensor = nvshmem.core.tensor((n_elements,), dtype=torch.float32)
the log:
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data01/houqi.1993/micromamba/envs/houqi/lib/python3.11/site-packages/nvidia/nvshmem/lib torchrun --node_rank=0 --nproc_per_node=8 --nnodes=1 --rdzv_endpoint=127.0.0.1:12345 ~/ProgramFiles/nvshmem_python-source-0.1.0.36132199_cuda12-archive/examples/torch_triton_interop.py
[W703 10:53:52.783941591 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W703 10:53:52.788321305 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W703 10:53:52.803465628 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W703 10:53:52.803650140 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W703 10:53:52.807265264 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W703 10:53:52.811320509 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W703 10:53:52.821334481 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W703 10:53:52.824206420 ProcessGroupGloo.cpp:727] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
iter 0
iter 0
iter 0
iter 0
iter 0
H800-1-docker-n122-200-178:2227842:2227842 [3] NVSHMEM DEBUG : Creating NvshmemResource for device 3
H800-1-docker-n122-200-178:2227846:2227846 [7] NVSHMEM DEBUG : Creating NvshmemResource for device 7
H800-1-docker-n122-200-178:2227844:2227844 [5] NVSHMEM DEBUG : Creating NvshmemResource for device 5
H800-1-docker-n122-200-178:2227841:2227841 [2] NVSHMEM DEBUG : Creating NvshmemResource for device 2
H800-1-docker-n122-200-178:2227840:2227840 [1] NVSHMEM DEBUG : Creating NvshmemResource for device 1
iter 0
H800-1-docker-n122-200-178:2227845:2227845 [6] NVSHMEM DEBUG : Creating NvshmemResource for device 6
iter 0
H800-1-docker-n122-200-178:2227843:2227843 [4] NVSHMEM DEBUG : Creating NvshmemResource for device 4
iter 0
H800-1-docker-n122-200-178:2227839:2227839 [0] NVSHMEM DEBUG : Creating NvshmemResource for device 0
H800-1-docker-n122-200-178:2227845:2227845 [6] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 6 (NVIDIA H800)>> at address 1440699386368 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227839:2227839 [0] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 0 (NVIDIA H800)>> at address 1405802777088 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227844:2227844 [5] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 5 (NVIDIA H800)>> at address 1440699386368 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227840:2227840 [1] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 1 (NVIDIA H800)>> at address 1440699386368 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227843:2227843 [4] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 4 (NVIDIA H800)>> at address 1440699386368 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227841:2227841 [2] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 2 (NVIDIA H800)>> at address 1440699386368 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227846:2227846 [7] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 7 (NVIDIA H800)>> at address 1440699386368 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227842:2227842 [3] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 3 (NVIDIA H800)>> at address 1440699386368 with size 3470120 on stream None
iter 1
iter 1iter 1iter 1iter 1iter 1iter 1
iter 1
H800-1-docker-n122-200-178:2227841:2227841 [2] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 2 (NVIDIA H800)>> at address 1440702856704 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227845:2227845 [6] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 6 (NVIDIA H800)>> at address 1440702856704 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227843:2227843 [4] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 4 (NVIDIA H800)>> at address 1440702856704 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227844:2227844 [5] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 5 (NVIDIA H800)>> at address 1440702856704 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227840:2227840 [1] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 1 (NVIDIA H800)>> at address 1440702856704 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227839:2227839 [0] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 0 (NVIDIA H800)>> at address 1405806247424 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227846:2227846 [7] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 7 (NVIDIA H800)>> at address 1440702856704 with size 3470120 on stream None
H800-1-docker-n122-200-178:2227842:2227842 [3] NVSHMEM DEBUG : Created Buffer on resource <NvshmemResource device=<Device 3 (NVIDIA H800)>> at address 1440702856704 with size 3470120 on stream None
iter 2
iter 2
iter 2iter 2
iter 2iter 2
iter 2iter 2
..... many other more logs, but no log from deallocate logic until call nvshmem.finalize and free ....
