Tao-toolkit RTX 3090 perform worse training speed than RTX 2080 ti

Please provide the following information when requesting support.

• Hardware (RTX3090/RTX2080ti/)
• Network Type (Yolo_v4)
• TLT Version (3.2)

I trained two exact same yolo v4 models with same datset but two different GPUs, RTX 3090 and RTX 2080 ti. The code for training was originally from ngc and I haven’t change anything.

When both has batch_size equals to 8, It took RTX 3090 580-620 seconds to finish one epoch. The training performance of RTX 3090 was not only a lot slower than I expected and even slower than RTX 2080-ti which spent 540-580 seconds / epoch.

Could anybody tell me why is that?

Did you mean you run with 2 machines, one with 3090 and another with 2080ti?

Yes, correct.

How about the GPU utilization separately?