We are trying to train a tlt yolo_v4 model. We have a custom dataset of 25.000 images and are training on 2 GPUs (GeForce RTX 2080 Ti), driver version: 455.32.00, CUDA version: 11.1, TLT version: 3.0.
Despite the small dataset, each epoch takes one hour. Would you say this is expected? Or is something wrong?
The command we used is: tlt yolo_v4 train --gpus 2 -e /path/to/spec.txt -r /path/to/result -k $KEY
Here is an extract from the config:
random_seed: 42
yolov4_config {
big_anchor_shape: “[(87.07, 119.20), (119.47, 87.33), (124.67, 123.07)]”
mid_anchor_shape: “[(78.13, 78.13), (59.73, 105.20), (106.93, 60.80)]”
small_anchor_shape: “[(36.67, 35.87), (48.00, 66.27), (68.13, 48.53)]”
box_matching_iou: 0.25
arch: “cspdarknet”
nlayers: 19
arch_conv_blocks: 2
loss_loc_weight: 0.8
loss_neg_obj_weights: 100.0
loss_class_weights: 0.5
label_smoothing: 0.0
big_grid_xy_extend: 0.05
mid_grid_xy_extend: 0.1
small_grid_xy_extend: 0.2
freeze_bn: false
#freeze_blocks: 0
force_relu: false
training_config {
batch_size_per_gpu: 8
num_epochs: 200
enable_qat: true
checkpoint_interval: 10
learning_rate {
soft_start_cosine_annealing_schedule {
min_learning_rate: 1e-7
max_learning_rate: 1e-4
soft_start: 0.3
regularizer {
type: L1
weight: 3e-5
optimizer {
adam {
epsilon: 1e-7
beta1: 0.9
beta2: 0.999
amsgrad: false
eval_config {
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
nms_config {
confidence_threshold: 0.001
clustering_iou_threshold: 0.5
top_k: 200
augmentation_config {
hue: 0.1
saturation: 1.5
horizontal_flip: 0.5
jitter: 0.3
output_width: 512
output_height: 288
randomize_input_shape_period: 0
mosaic_prob: 0.5
Thanks for the help