I’m trying to train custom dataset by using TAO with multi-GPUs
When I start the training process with up to 2 GPUs works fine, but when start process with 3 or more GPUS the error bellow is raised.
tao info
Configuration of the TAO Toolkit Instance
dockers: ['nvidia/tao/tao-toolkit-tf', 'nvidia/tao/tao-toolkit-pyt', 'nvidia/tao/tao-toolkit-lm']
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022
Env Info
ubuntu@aws-xxx:~$ free -g
total used free shared buff/cache available
Mem: 186 40 70 2 76 142
Swap: 0 0 0
ubuntu@aws-xxx:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 94G 0 94G 0% /dev
tmpfs 19G 2.0M 19G 1% /run
/dev/nvme0n1p1 582G 392G 190G 68% /
tmpfs 200G 0 200G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 94G 0 94G 0% /sys/fs/cgroup
/dev/loop2 25M 25M 0 100% /snap/amazon-ssm-agent/4046
/dev/loop4 56M 56M 0 100% /snap/core18/2284
tmpfs 19G 16K 19G 1% /run/user/127
tmpfs 19G 32K 19G 1% /run/user/1000
/dev/loop6 44M 44M 0 100% /snap/snapd/14978
/dev/loop1 27M 27M 0 100% /snap/amazon-ssm-agent/5163
/dev/loop3 44M 44M 0 100% /snap/snapd/15177
/dev/loop5 56M 56M 0 100% /snap/core18/2344
Sun Mar 27 18:03:59 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10G On | 00000000:00:1B.0 Off | 0 |
| 0% 39C P0 132W / 300W | 18304MiB / 22731MiB | 51% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A10G On | 00000000:00:1C.0 Off | 0 |
| 0% 40C P0 115W / 300W | 18063MiB / 22731MiB | 54% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A10G On | 00000000:00:1D.0 Off | 0 |
| 0% 27C P8 24W / 300W | 2MiB / 22731MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 27C P8 22W / 300W | 2MiB / 22731MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Any tips??
To run with multigpu, please change --gpus based on the number of available GPUs in your machine.
2022-03-27 17:52:56,613 [INFO] root: Registry: ['nvcr.io']
2022-03-27 17:52:56,709 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-27 17:52:56,758 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
.
.
.
2022-03-27 17:54:00,255 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:9: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.
Epoch 21/80
dfd5d902fdd1:132:427 [0] NCCL INFO Bootstrap : Using lo:127.0.0.1<0>
dfd5d902fdd1:132:427 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
dfd5d902fdd1:132:427 [0] NCCL INFO NET/IB : No device found.
dfd5d902fdd1:132:427 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.5<0>
dfd5d902fdd1:132:427 [0] NCCL INFO Using network Socket
NCCL version 2.9.9+cuda11.3
dfd5d902fdd1:133:426 [1] NCCL INFO Bootstrap : Using lo:127.0.0.1<0>
dfd5d902fdd1:133:426 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
dfd5d902fdd1:133:426 [1] NCCL INFO NET/IB : No device found.
dfd5d902fdd1:133:426 [1] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.5<0>
dfd5d902fdd1:133:426 [1] NCCL INFO Using network Socket
dfd5d902fdd1:134:432 [2] NCCL INFO Bootstrap : Using lo:127.0.0.1<0>
dfd5d902fdd1:134:432 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
dfd5d902fdd1:134:432 [2] NCCL INFO NET/IB : No device found.
dfd5d902fdd1:134:432 [2] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.5<0>
dfd5d902fdd1:134:432 [2] NCCL INFO Using network Socket
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00 : 0[1b0] -> 1[1c0] via direct shared memory
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01 : 0[1b0] -> 1[1c0] via direct shared memory
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Channel 00 : 1[1c0] -> 2[1d0] via direct shared memory
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)
dfd5d902fdd1:133:426 [1] NCCL INFO Channel 01 : 1[1c0] -> 2[1d0] via direct shared memory
dfd5d902fdd1:134:432 [2] NCCL INFO Channel 00 : 2[1d0] -> 0[1b0] via direct shared memory
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Channel 01 : 2[1d0] -> 0[1b0] via direct shared memory
dfd5d902fdd1:132:427 [0] NCCL INFO Connected all rings
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)
dfd5d902fdd1:134:432 [2] NCCL INFO Connected all rings
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:134:432 [2] NCCL INFO Channel 00 : 2[1d0] -> 1[1c0] via direct shared memory
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-send-c7b72b31a9e17c47-1-2-1 (size 4104)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:75 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:90 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:753 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-22179506c91c6ec1-0-1-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:753 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO Connected all rings
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-f185cfe1f5f1edc0-0-2-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:753 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-38787eb0df8af94b-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-624723612b9b6ac4-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-92d8e885fec5ebc5-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-8fb76c5feeba4735-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-ea17d6350df539af-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-b98611103acab8ae-0-0-1 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-880ceaf8a318e41a-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-b1db8fa8ef295593-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-e26d54cdc253d694-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-fc4af2d41d7fb652-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-56ab5ca93cbaa8cc-0-2-0 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-26199784699027cb-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-841ba93426ac98ce-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-29bb3f5f0771a654-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-5389e40f538217cd-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-9c0e5762d02a2b45-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-41aded8db0ef38cb-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-6b7c923dfcffaa44-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-d9906507e3c11201-0-0-1 (size 9637888)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-afc1c05797b0a088-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-a222a2cb6eb9302-0-2-0 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-ce4126d9bae41d76-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-f80fcb8a06f48eef-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-28a190aeda1f0ff0-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-e69bd103a364b6cd-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-8c3b672e8429c453-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-b60a0bded03a35cc-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-3343aef8bb44440d-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-63d5741d8e6ec50e-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-9750a486f33d294-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-278ffacd8344e9e1-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-515e9f7dcf555b5a-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-81f064a2a27fdc5b-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-282b8b49b1481d42-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-fe5ce6996537abc9-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-58bd506e84729e43-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-6528474142558c97-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-bf88b11661907f11-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-8ef6ebf18e65fe10-0-0-1 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-366685e858d679b0-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-5d4c0c385abf8af-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-dc061c13399b8736-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-b1623488996d8bac-0-2-0 (size 9637888)
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-80d06f63c6430aab-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-5701cab37a329932-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-8e6805127db39b46-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-5dd63fedaa891a45-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-34079b3d5e78a8cc-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-dc29b4c805391f78-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-368a1e9d247411f2-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-5f85978514990f1-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-4e8c2ba316cc59f2-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-f42bc1cdf7916778-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-1dfa667e43a1d8f1-0-0-1 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-44d1734f392b038c-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-1b02ce9eed1a9213-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-756338740c55848d-0-2-0 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-75ff748eedb276af-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-456daf6a1a87f5ae-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-1b9f0ab9ce778435-0-1-2 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-aa1e9195c6004f1f-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-47efb6ae53b4199-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-d3ed36461210c098-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-25ec15c2ffab58ff-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-4fbaba734bbbca78-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-804c7f981ee64b79-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-6f29817aed8203fb-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-9fbb469fc0ac84fc-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-455adccaa1719282-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-5bcbd7658946accc-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-8c5d9c8a5c712dcd-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-31fd32b53d363b53-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 00/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Channel 01/02 : 0 1 2
dfd5d902fdd1:132:427 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dfd5d902fdd1:134:432 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1
dfd5d902fdd1:132:427 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)
dfd5d902fdd1:133:426 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)
dfd5d902fdd1:134:432 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)
dfd5d902fdd1:133:426 [1] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:133:426 [1] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:133:426 [1] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-db7e10ed85ae9ce4-0-0-1 (size 9637888)
dfd5d902fdd1:133:426 [1] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:132:427 [0] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:132:427 [0] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-c0fd61258d91de5-0-2-0 (size 9637888)
dfd5d902fdd1:132:427 [0] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:28 NCCL WARN Call to posix_fallocate failed : No space left on device
dfd5d902fdd1:134:432 [2] NCCL INFO include/shm.h:41 -> 2
dfd5d902fdd1:134:432 [2] include/shm.h:48 NCCL WARN Error while creating shared memory segment nccl-shm-recv-b1af6c3d399e2b6b-0-1-2 (size 9637888)
dfd5d902fdd1:134:432 [2] NCCL INFO transport/shm.cc:100 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:34 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO transport.cc:84 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:742 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:867 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:132:427 [0] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:133:426 [1] NCCL INFO init.cc:916 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:903 -> 2
dfd5d902fdd1:134:432 [2] NCCL INFO init.cc:916 -> 2
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py", line 110, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 528, in return_func
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/utils.py", line 516, in return_func
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py", line 106, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py", line 63, in run_experiment
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[23953,1],1]
Exit code: 1
--------------------------------------------------------------------------