To run with multigpu, it will be killed,why? Not enough RAM OR CPU resource?

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:9: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

2021-09-22 10:18:25,338 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:9: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

Epoch 1/80

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun.real noticed that process rank 0 with PID 0 on node 86d3d86ef2f3 exited on signal 9 (Killed).

2021-09-22 18:20:01,242 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Can you login docker directly and run training again to see if there is the same issue?
$ tao yolo_v4 run /bin/bash
# yolo_v4 train xxx