Hello,
we are currently evaluating the performance differences between the RTX 6000 Ada and the H100 in some real-world tasks. For this, we focus on training multi-modal object detection models, and specifically Sparse4D v3.
I ran our benchmark setting on both GPUs, but the machines had slightly different specifications:
Setting 1:
- 1x 6000 Ada
- 16 cores of AMD EPYC 9354
- 125 GB RAM
- Unknown local SSD (mini dataset will easily fit into cache in RAM, so it should not matter)
Setting 2:
- 1x H100 SMX
- 16 cores of Intel Xeon Platinum 8462Y+
- 250 GB RAM
- Unknown local SSD (mini dataset will easily fit into cache in RAM, so it should not matter)
With this, I measured the time for one training step to be 1.64s on the RTX 6000 Ada (including a data time of 0.07s). The time on the H100 was 1.02s (data_time: 0.05s).
In other words, the H100 had 162% of the performance of the RTX 6000 Ada.
On another in-house benchmark with similar characteristics, the H100 even only achieved 135% of the performance of the RTX 6000 Ada.
I would like to understand what speedup I should in general expect on similar ML tasks. When comparing e.g. the theoretical fp16 tensor core throughput, the H100 should have 273% of the performance of the RTX 6000 Ada.
You can find the necessary steps to reproduce my benchmark results here:
Best regards,
Ole