As below:
INFO: Using cached SIF image
INFO: MPS server daemon started
INFO: Running QE with:
2 GPUs
4 MPI processes total
2 MPI processes per GPU
2 Pools
2 KPoints
-
mpirun -n 4 ph.x -input [phonon.in] -npool 2
-
tee phonon.txt
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 0; it may have occurred for other processes as well.
NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).
Executable: ph.x
My input command:
singularity run --nv -B${PWD}:/host_pwd --pwd /host_pwd docker://nvcr.io/hpc/quantum_espresso:qe-7.1 ./run_qe_ph.sh
run_qe_ph.sh:
#!/bin/bash
set -euf -o pipefail
readonly gpu_count=${QE_GPU_COUNT:-$(nvidia-smi --list-gpus | wc -l)}
readonly procs_per_gpu=${QE_PROCS_PER_GPU:-2}
readonly host_mps=${QE_INPUT:-}
gpu_list=$(seq -s, 0 “$(( gpu_count -1 ))”)
export CUDA_VISIBLE_DEVICES=${gpu_list}
readonly proc_count=$(( gpu_count*procs_per_gpu ))
readonly gpu_mem_avail=$(nvidia-smi --id=“${gpu_list}” --query-gpu=[memory.total] --format=csv,nounits,noheader | awk ‘{s+=$1} END {print s}’)
readonly gpu_mem_per_kpoint=16384
Use the maximum pool count, the lesser of kpoints and processor count
readonly kpoints=2
readonly npool=$(( kpoints > proc_count ? proc_count : kpoints ))
Attempt to start MPS server within container if needed
if (( procs_per_gpu > 1 )) && [[ -z “${host_mps}” ]]; then
export CUDA_MPS_PIPE_DIRECTORY=“${PWD}/.mps”
export CUDA_MPS_LOG_DIRECTORY=“${PWD}/.mps”
if ! nvidia-cuda-mps-control -d; then
echo “ERROR: Failed to start MPS daemon.”
exit 1
fi
echo “INFO: MPS server daemon started”
trap “echo quit | nvidia-cuda-mps-control” EXIT
fi
echo “INFO: Running QE with:”
echo " ${gpu_count} GPUs"
echo " ${proc_count} MPI processes total"
echo " ${procs_per_gpu} MPI processes per GPU"
echo " ${npool} Pools"
echo " ${kpoints} KPoints"
set -x
mpirun -n ${proc_count} \
ph.x \
-input NaCl-scf.in\
-npool ${npool} \
2>&1 | tee qe_log.txt
Could anyone help me? please