Transfer Learning Toolkit Multi-GPUs

Hey everyone, I am trying to run the transfer learning toolkit on multiple GPUs. My company purchased a DGX system with 8 Tesla V100 GPUs. Running the detectnet_v2 example from the TLT container on NGC seems to run fine on a single GPU, however when scaling up to 8 I receive the following output. It seems as though the model is still training but it just takes longer than expected. I believe this issue is related to the evaluation phase, since it looks like all epochs flow fine until it gets to every 10th epoch where it performs evaluation. Below is the evaluation_config section of my experiment.prototxt file.

Experiment Eval Section
evaluation_config {
validation_period_during_training: 10
first_validation_epoch: 1
minimum_detection_ground_truth_overlap {
key: “m1”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “m2”
value: 0.5
}
minimum_detection_ground_truth_overlap {
key: “leopard”
value: 0.5
}
evaluation_box_config {
key: “m1”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “m2”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
evaluation_box_config {
key: “leopard”
value {
minimum_height: 20
maximum_height: 9999
minimum_width: 10
maximum_width: 9999
}
}
average_precision_mode: INTEGRATE
}

Start Training
tlt-train detectnet_v2 -e $SPECS_DIR/detectnet_v2_train_resnet18_kitti_2gpu.txt \ -r $USER_EXPERIMENT_DIR/02/experiment_dir_unpruned \ -k $KEY \ -n resnet18_detector \ --gpus 8

Output
One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock.
Stalled ops:DistributedAdamOptimizer_Allreduce/HorovodAllreduce_gradients_AddN_58_0 [missing ranks: 0]
[2020-04-23 23:49:04.302538: W horovod/common/operations.cc:588] DistributedAdamOptimizer_Allreduce/HorovodAllreduce_gradients_AddN_59_0 [missing ranks: 0]

Does anyone have an idea as to what may be happening? Thanks in advance.