I recently set up a HP Z8 G4 workstation with two RTX 2080 gpus for cuda-accelerated Molecular dynamics simulations with gromacs.
The operating system is Ubuntu 18.04 LTS, nvidia.smi reports:
NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1
Each card works fine by itself, running at around 70% of usage and using around 180W of power (P2 power state).
However, if I try to run two jobs at the same time, each using a gpu, the gpus’ usage drops to 1%.
I wonder if this could be a driver issue, a power delivery problem or something else.
nvidia-bug-report.log.gz (1.44 MB)
Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
Driver/software setup looks fine but the gpus are quite hot, might be an airflow/cooling problem. Please create a new nvidia bug-report.log while both gpus are under load.
Thanks for the help. It turned out it was an error in the settings of gromacs. When two jobs were submitted, they were using the same cpus, that’s way the gpus were not utilized.
Generix, thanks for the observation regarding the temperatures, I set up a power limit restriction and the gpus are running cooler now.