Discrepancy in Amber Benchmark Performance on DGX Server (A100 GPU)

I am writing to address a concern regarding the performance of the Amber benchmark on the NVIDIA DGX server.

Upon testing the Amber benchmark on my server, I noticed a significant difference compared to the benchmark results showcased on NVIDIA website. (https://developer.nvidia.com/hpc-application-performance)

The benchmark I conducted on my server yielded results that were notably similar to the NIH benchmark results, rather than the benchmarks provided by NVIDIA. (https://hpc.nih.gov/apps/amber/)

The benchmark I conducted on my server yielded results that were notably smaller and did not align with the performance expectations set by the benchmarks presented on NVIDIA’s platform. (Please refer to the attached image file.)

MD engine is Amber22, and it was built to enable multiple gpu through NCCL.
CPU: AMD EPYC 7742 64-Core Processor
GPU: A100-SXM4-40GB GPU
CUDA version: 11.2
Nvidia driver version : 460.73.01

Could you please shed some light on the potential reasons behind this variance? The key question is that performance does not increase even if I increase the GPU. Additionally, I would like to inquire if there are any optimizations or configurations that can be implemented to achieve performance levels closer to those demonstrated on NVIDIA website.

benchmark