TLT yolo_v4 slow training

benedetta.delfino · May 28, 2021, 12:55pm

Hello,

We are trying to train a tlt yolo_v4 model. We have a custom dataset of 25.000 images and are training on 2 GPUs (GeForce RTX 2080 Ti), driver version: 455.32.00, CUDA version: 11.1, TLT version: 3.0.

Despite the small dataset, each epoch takes one hour. Would you say this is expected? Or is something wrong?

The command we used is: tlt yolo_v4 train --gpus 2 -e /path/to/spec.txt -r /path/to/result -k $KEY

Here is an extract from the config:

random_seed: 42
yolov4_config {
big_anchor_shape: “[(87.07, 119.20), (119.47, 87.33), (124.67, 123.07)]”
mid_anchor_shape: “[(78.13, 78.13), (59.73, 105.20), (106.93, 60.80)]”
small_anchor_shape: “[(36.67, 35.87), (48.00, 66.27), (68.13, 48.53)]”
box_matching_iou: 0.25
arch: “cspdarknet”
nlayers: 19
arch_conv_blocks: 2
loss_loc_weight: 0.8
loss_neg_obj_weights: 100.0
loss_class_weights: 0.5
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
#freeze_blocks: 0
force_relu: false
}
training_config {
batch_size_per_gpu: 8
num_epochs: 200
enable_qat: true
checkpoint_interval: 10
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
}
}
regularizer {
type: L1
weight: 3e-5
}
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
}
}
}
eval_config {
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
}
augmentation_config {
hue: 0.1
saturation: 1.5
exposure:1.5
vertical_flip:0.5
horizontal_flip: 0.5
jitter: 0.3
output_width: 512
output_height: 288
randomize_input_shape_period: 0
mosaic_prob: 0.5
mosaic_min_ratio:0.2
}

Thanks for the help

Morganh · May 30, 2021, 3:43am

Currently, it is a known limitation in TLT 3.0_dp version. We’re working on that internally.

benedetta.delfino · June 1, 2021, 8:15am

Thanks for your response. Is there anything we can do in the meantime to improve training time?

benedetta.delfino · June 1, 2021, 8:39am

also what is the ETA for fixing this issue?

Morganh · June 1, 2021, 10:06am

Currently there is not workaround for it. Internal team are working on it for improvement in next release.

benedetta.delfino · June 1, 2021, 10:07am

Is there not a previous version of the TLT container that doesn’t have this problem?

Morganh · June 1, 2021, 11:01am

TLT 2.0_py3 should be faster. But it has not yolo_v4. There is yolo_v3.

Morganh · July 12, 2021, 10:22am

Please note that there are new features to tune workers and use_multiprocessing which is in TLT 3.0 version docker. It will help improve training speed.
For example, in yolo_v4, see https://docs.nvidia.com/tlt/tlt-user-guide/text/object_detection/yolo_v4.html?highlight=workers

n_worker: The number of workers for data loading per GPU
use_multiprocessing: Whether to use multiprocessing mode of keras sequence data loader

corentin87 · August 18, 2021, 4:40am

Hi,
Any update on this problem? Just performed a training of yolo_v4 on ec2 p3.x8. It took 20h of training with TLT vs 9h with Darknet for the same number of epochs…

I didn’t use the parameters you mentioned because I used the default config. but I don’t think that’s the main issue here.

benedetta.delfino · August 18, 2021, 8:34am

hi, for us the only thing that helped is resize the images manually first and then feed them to the network for training. (there likely is a problem with the default dataloader?) Hope it helps

Morganh · August 18, 2021, 9:11am

In tlt 3.0-py3 docker, please set below to help training

max_queue_size
n_workers
use_multiprocessing

See NVIDIA TAO Documentation

Topic		Replies	Views
Extremely slow train and evaluation of yolo_v4_tiny TAO Toolkit yolo , tao	12	1235	April 12, 2023
Tltv3 yolov4 train set aren't loaded TAO Toolkit tensorflow	4	622	June 25, 2021
[TLT] YoloV4 training fails. training process asigned to CPU instead of GPU? TAO Toolkit	8	442	August 9, 2022
Training Speed is too low While training TAO Toolkit	2	398	April 8, 2022
TLT Training duration TAO Toolkit	5	624	October 12, 2021
Training of Yolov3 model get randomly killed TAO Toolkit	6	565	July 6, 2022
Memory usage continue growing up when training TAO Toolkit	5	303	July 4, 2023
Training Become very slow Yolov4 TAO Toolkit	25	2097	January 25, 2022
LPD Training Spec for YoloV4? TAO Toolkit	2	368	August 16, 2022
Training got killed before start TAO Toolkit	18	1439	February 8, 2022

TLT yolo_v4 slow training

Related topics