Tao-toolkit RTX 3090 perform worse training speed than RTX 2080 ti

Please provide the following information when requesting support.

• Hardware (RTX3090/RTX2080ti/)
• Network Type (Yolo_v4)
• TLT Version (3.2)

I trained two exact same yolo v4 models with same datset but two different GPUs, RTX 3090 and RTX 2080 ti. The code for training was originally from ngc and I haven’t change anything.

When both has batch_size equals to 8, It took RTX 3090 580-620 seconds to finish one epoch. The training performance of RTX 3090 was not only a lot slower than I expected and even slower than RTX 2080-ti which spent 540-580 seconds / epoch.

Could anybody tell me why is that?

Did you mean you run with 2 machines, one with 3090 and another with 2080ti?

Yes, correct.

How about the GPU utilization separately?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.