8xH100 server training time higher than 8xA100 server

I have opened and described issue in details on Nvidia github page. link below

https://github.com/NVIDIA/DeepLearningExamples/issues/1336

Based on additional details on the github issue I changed the category to Nsight Systems.

I’ve handed it off to Leslie Monis for initial triage. You should hear back from him here.

It looks like you have included the nsys profiler while measuring the time taken for training. Do you observe a similar difference in training times without the presence of the nsys profiler?

@lmonis yes, I do observe similar difference in training times without nsys as well.

A100 profiling (time taken to complete: w/ nsys :7m48.808s, w/o nsys : 3m51.057s)
H100 profiling (time taken to complete: w/ nsys : 23m16.4s, w/o nsys : 5m29.673s)