Getting best performance from GPUs while using Quantum ESPRESSO

Dear all,
I am trying to use Quantum ESPRESSO with GPU acceleration.
However, since I am new to this field, I do not know about the flags and variables that could be exported to get the optimum performance.

Here is a bunch of variables that were suggested to me.:

export GPU_FORCE_64BIT_PTR=0
export GPU_MAX_HEAP_SIZE=100
export GPU_USE_SYNC_OBJECTS=1
export GPU_MAX_ALLOC_PERCENT=100
export GPU_SINGLE_ALLOC_PERCENT=100

export OMP_NUM_THREADS=1
export MPI_PER_GPU=20
export ENABLE_MPS=false
export LS_HYPERTHREAD=true

I, being a novice, do not understand how this work and I am pretty sure that the GPUs are not being utilized fully. I say this since I tried to run some basic profiling all of which indicate the same:

Further, I
could not find the documentation about these.

I am using a slurm script to run my jobs. Every GPU node on the cluster has 2 Tesla V100-SXM2-16GB and 40CPU cores.

Could anyone shed some light on:

  1. What all flags and variables are there that could be set, and how to decide which are the relevant ones for my program?

  2. Where can I read more about these variables and flags?

Any insights would be much helpful.

You’re unlikely to get help here. This forum is for users who are making direct use of such libraries as cublas, cufft, etc. Not for QE support. There is a QE users forum here.

1 Like