RTX A4000 for MD Simulations

debadutta1994 · August 24, 2022, 8:41pm

Hi,
I have been trying to use Gromacs compiled with CUDA to simulate biomolecules, but for some weird reasons, the simulation performance with GPU is equal to, if not worse than, the simulation without GPU.
Currently, my computer’s configuration is:
CPU: AMD Threadripper 3975WX
RAM: 128GB DDR4 ECC
GPU: RTX A4000
Operating System: Ubuntu 22.04
CUDA version 11.2.1
Gromacs version 2021.4 patched with plumed 2.8 (I have also tried other gromacs versions, but it doesn’t make a difference)

Can anyone please help me figure out what is happening here?

Robert_Crovella · August 24, 2022, 8:51pm

It’s not guaranteed that for any arbitrary input deck that gromacs will run faster on GPU vs. CPU. There are various published gromacs GPU benchmarks (such as here, RTX A4000 example is there) , my suggestion would be to test one of those.

I don’t think you’re going to find lots of gromacs experts here, so you may also wish to try on a gromacs forum.

debadutta1994 · August 24, 2022, 9:19pm

Actually, I tried the same input files on a similar software environment using a Ryzen 9 5900HX and RTX 3060m laptop and the laptop consistently outperformed the RTX A4000 workstation which was quite unexpected. I raised this question on gromacs forum but did not get any answers.
I also checked GPU usage with nvidia-smi which shows a 46/140W power usage status.

Regarding the GPU benchmarks, the configuration reported by puget systems is similar to what I have, and the performance I am getting now is roughly the same reported over there. I have approximately 80K atoms in my system, and I am getting around 30ns/day; this is slightly lower than the CPU-only calculations (32 ns/day). The RTX 3060 laptop gives ~55ns/day for the same simulation setup.

Can you please tell me why this might be happening?

Robert_Crovella · August 24, 2022, 9:37pm

I don’t personally use gromacs. I think you’ll find a lot of gromacs users on the gromacs forum. Perhaps someone else will be able to explain your results. It seems to me like your A4000 results are reasonable if you believe your test case is similar to the Puget case.

debadutta1994 · August 24, 2022, 9:48pm

Thanks for your help.
Yes, the performance seems to be reasonable from the puget case (they did mention that A4000 is exceptionally bad with all simulation engines), but unfortunately, it doesn’t seem like any kind of GPU acceleration is happening with A4000 for some weird reasons, no matter what I try. I am kind of hoping that this issue can be fixed with future software updates.
Meanwhile, I will wait for someone from gromacs forum addresses this issue.

rs277 · August 25, 2022, 12:27am

It could be worthwhile mentioning the Puget benchmark results in the Gromacs forum, to draw attention, given there were anomalous results for other GPUs also.

njuffa · August 25, 2022, 12:53am

This certainly indicates that the GPU is not being used heavily. If it were idling, however, you would see power draw around 7W or so. I am not a Gromacs user, and the Gromacs forums that you have already been pointed is the best resource to resolve Gromacs issues. From what I have observed, some of the key Gromacs developers are active there. However, keep in mind that this is the time of year many people take their summer vacation, so the forums may be pretty quiet.

From what little I can recall from past interactions with the Gromacs folks, how well Gromacs can utilize the GPU depends on specifics of the configuration, as not all functionality can be offloaded to the GPU equally well. You may want to check the documentation whether your configuration utilizes functionality that is not yet GPU accelerated.

One general issue Gromacs was affected by in the past is that the distribution of work between CPU and GPU would cause it to become bottlenecked on the CPU portion when used with high-end GPUs. But from what I understand, this issue no longer exists in Gromacs 2019 or later versions, requiring only 2-4 CPU cores to keep the GPU well fed.

njuffa · August 25, 2022, 1:24am

From the Puget Sound benchmarking comparison from May of this year:

The A4000 gave surprisingly poor performance on all test. I had naively expected it to perform relative to the (excellent) A4500.

From looking at the specs, the A4000 provides about 2/3 the cache size, memory bandwidth, and FLOPS of the A4500, so I would expect performance between these two GPUs to differ by a factor of 1.5x, but apparently that is not the case. I have not used an A4000 and have no explanation for that. Could the relatively small size of on-board memory (16 GB for the A4000) be the issue, or some sort of cache-thrashing?

If you have multiple GPUs to compare against, you might want to try to dig into performance details with the CUDA profiler.

The A4000 is a single-slot design. In my experience, single-slot designs often have problem with cooling causing clock throttling because thermal limits are hit quickly under high load. You can monitor GPU temperature with nvidia-smi, but if the 46W power draw reported above is typical during Gromacs runs, I would not expect thermals to be an issue. The single-slot designs often have issues dissipating more than 100W to 110W of continuous load.

debadutta1994 · August 25, 2022, 3:57am

Initially, I thought it might be a heating issue, but temperatures reported by nvidia-smi and lm-sensors are around 62C. I don’t think there’s any thermal throttling.

From what little I can recall from past interactions with the Gromacs folks, how well Gromacs can utilize the GPU depends on specifics of the configuration, as not all functionality can be offloaded to the GPU equally well.

I did try manually offloading tasks into GPU, but it doesn’t seem to make much of a difference which tasks are going to GPU. No matter what combinations I try, the results are similar.

I also benchmarked it with a 3060m(130W), which has a 6GB memory and it outperformed A4000 (1.5-2.5X more performance), so I don’t think memory size is the issue here. So what’s happening over here is a mystery.

Topic		Replies	Views
Best graphics card for running gromacs CUDA Programming and Performance	9	1924	August 31, 2020
Creating Faster Molecular Dynamics Simulations with GROMACS 2020 Technical Blog	15	2595	January 10, 2023
Molecular dynamics simulations on GROMACS with CUDA runs slow midway through CUDA Programming and Performance cuda	6	1883	March 31, 2024
GROMACS Molecular Dynamics simulations run increasingly slower as simulation progresses CUDA Programming and Performance cuda , ubuntu	3	670	August 25, 2024
Fatal error: Unexpected cudaStreamQuery failure: unspecified launch failure CUDA Programming and Performance cuda	3	1081	August 3, 2022
Questions about Performance GTX 295, FX 3800, GROMACS, GAUSSIAN03 CUDA Programming and Performance	3	6703	November 10, 2009
Molecular Dynamics Simulation could not integrate Nvidia GTX 1660 with our GROMACS Linux cuda , ubuntu	0	354	September 24, 2021
A Guide to CUDA Graphs in GROMACS 2023 Technical Blog	1	809	July 18, 2023
My NAMD CUDA expirience thus far GTX 260 192sp CUDA Programming and Performance	13	39566	June 18, 2010
Monte Carlo simulations on GPU CUDA Programming and Performance	6	6370	November 28, 2009

RTX A4000 for MD Simulations

Related topics