Controlling number of threads for DO CONCURRENT and interactions with MPI

mikk.sanborn · July 26, 2024, 9:36pm

How can I control or limit how many threads a DO CONCURRENT program tries to use?

I have not been able to find a compiler option nor runtime shell variable to limit the number of threads that a DO CONCURRENT program uses, unlike the options available for something like -Mconcur=allcores, which can be controlled by using OMP_NUM_THREADS or NCPUS.

Additionally, when running a program that uses DO CONCURRENT with MPI, it seems as though each MPI process is limited to using one thread and thus cannot use DO CONCURRENT to its advantage. Is there a way to allow each MPI process to use more than one core for the purposes of a DO CONCURRENT section?

Finally, what is the benefit of explicitly using DO CONCURRENT over compiling with -Mconcur? Thanks!

MatColgrove · July 29, 2024, 4:40pm

Hi mikk.sanborn,

I have not been able to find a compiler option nor runtime shell variable to limit the number of threads that a DO CONCURRENT program uses,

Our DO CONCURRENT implementation uses OpenACC “under-the-hood”, hence the environment variable to control the number of CPU threads is “ACC_NUM_CORES”.

Additionally, when running a program that uses DO CONCURRENT with MPI, it seems as though each MPI process is limited to using one thread and thus cannot use DO CONCURRENT to its advantage.

What flag did you use to compile the code?

The default when targeting CPU, i.e. “-stdpar=multicore”, would be to create a thread for every core on the system, unless ACC_NUM_CORES is set. Since you’re only seeing one thread, I’m wondering if you missed adding the flag, in which case DO CONNCURRENT is run serially, or you might have used “-stdpar” which targets GPUs.

Another possibility, is that all threads are being created, but the default OpenMPI binding, i.e. “–bind-to cores”, is binding them to the same core. When doing hybrid runs like this, I’ll typically change the binding to “–bind-to sockets” with ACC_NUM_CORES set to the number of cores per socket, or use a wrapper script with “–bind-to none” setting and then use numactl to perform the binding on a per rank basis.

Finally, what is the benefit of explicitly using DO CONCURRENT over compiling with -Mconcur ?

-Mconcur enables auto-parallelism, meaning the compiler will attempt to parallelize loop if it can prove independence. Though it may or may not be able to parallelize all loops. It’s also a feature of nvfortran so may not be available with other compilers.

To ensure parallelism and portability, you’d want to use explicit constructs such as Standard Language Parallelism or directive based approaches such as OpenMP or OpenACC.

Topic		Replies	Views
Number of threads in `do concurrent` loops nvc, nvc++ and nvfortran	1	614	May 8, 2023
Are the number of CPU threads for DC controlled by OMP_NUM_THREADS or ACC_NUM_CORES? nvc, nvc++ and nvfortran	1	321	December 6, 2023
OpenACC for Multicore nvc, nvc++ and nvfortran	3	17	May 30, 2025
How to control which processors are being used Legacy PGI Compilers	3	13172	June 2, 2005
parallel computation by using pgcc compiler in dual core mac Legacy PGI Compilers	6	9616	May 19, 2006
OpenACC on GPU and ISO Fortran on multicore nvc, nvc++ and nvfortran	3	510	October 6, 2023
Only one CPU is busy Legacy PGI Compilers	5	5374	April 5, 2012
Performance with hybrid setup Legacy PGI Compilers	6	816	March 18, 2022
Is unified memory (-gpu=managed) supported for OpenMP offloading (-mp=gpu)? nvc, nvc++ and nvfortran	5	1193	September 16, 2023
Do concurrent with gpu or multicore nvc, nvc++ and nvfortran	4	169	July 8, 2024

Controlling number of threads for DO CONCURRENT and interactions with MPI

Related topics