Training speed issue

Hi, I’m training my model (detectnet and yolov4) on gtx3090 and a100. (which means I tried on two different machines)

If found that on COCO dataset, its training speed is very low than I expected. (3 hours per epoch.)
When I use another training script, its 3x ~ 5x faster than 3 hours.

Do you have any becnchmarks on training hours on certain dataset or gpu machines?

Thanks in advance.

Nope, there is not benchmark for training hours.
Please
• Try to use AMP since your GPU supports it. See more in https://docs.nvidia.com/tao/tao-toolkit/text/qat_and_amp_for_training.html#automatic-mixed-precision
• Try tfrecord data loader. In this way, please disable mosaic. See more in YOLOv4 — TAO Toolkit 3.22.05 documentation
• Set randomize_input_shape_period to 0.