If I set --gpus 1, it is fine.
If I set 4, I have the following errors.
[MaskRCNN] INFO : # ============================================= #
[MaskRCNN] INFO : Start Training
[MaskRCNN] INFO : # %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% #
[GPU 00] Restoring pretrained weights (265 Tensors)
[MaskRCNN] INFO : Pretrained weights loaded with success...
[MaskRCNN] INFO : Saving checkpoints for 0 into /workspace/Nyan/cv_samples_v1.3.0/mask_rcnn/experiment_dir_unpruned/model.step-0.tlt.
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real noticed that process rank 3 with PID 0 on node 47754d4a4716 exited on signal 9 (Killed).
--------------------------------------------------------------------------