Please provide the following information when requesting support.
• Hardware (RTX3090/RTX2080ti/)
• Network Type (Yolo_v4)
• TLT Version (3.2)
I trained two exact same yolo v4 models with same datset but two different GPUs, RTX 3090 and RTX 2080 ti. The code for training was originally from ngc and I haven’t change anything.
When both has batch_size equals to 8, It took RTX 3090 580-620 seconds to finish one epoch. The training performance of RTX 3090 was not only a lot slower than I expected and even slower than RTX 2080-ti which spent 540-580 seconds / epoch.
Could anybody tell me why is that?