Performance of 6000 Ada vs. H100 for multi-modal object detection training

Hello,

we are currently evaluating the performance differences between the RTX 6000 Ada and the H100 in some real-world tasks. For this, we focus on training multi-modal object detection models, and specifically Sparse4D v3.

I ran our benchmark setting on both GPUs, but the machines had slightly different specifications:
Setting 1:

  • 1x 6000 Ada
  • 16 cores of AMD EPYC 9354
  • 125 GB RAM
  • Unknown local SSD (mini dataset will easily fit into cache in RAM, so it should not matter)

Setting 2:

  • 1x H100 SMX
  • 16 cores of Intel Xeon Platinum 8462Y+
  • 250 GB RAM
  • Unknown local SSD (mini dataset will easily fit into cache in RAM, so it should not matter)

With this, I measured the time for one training step to be 1.64s on the RTX 6000 Ada (including a data time of 0.07s). The time on the H100 was 1.02s (data_time: 0.05s).

In other words, the H100 had 162% of the performance of the RTX 6000 Ada.
On another in-house benchmark with similar characteristics, the H100 even only achieved 135% of the performance of the RTX 6000 Ada.

I would like to understand what speedup I should in general expect on similar ML tasks. When comparing e.g. the theoretical fp16 tensor core throughput, the H100 should have 273% of the performance of the RTX 6000 Ada.

You can find the necessary steps to reproduce my benchmark results here:

Best regards,

Ole

Verify the temperature and utilization of the H100 during the benchmark and if possible use Nsight Compute to verify the occupancy of the benchmark.