Controlling number of threads for DO CONCURRENT and interactions with MPI

How can I control or limit how many threads a DO CONCURRENT program tries to use?

I have not been able to find a compiler option nor runtime shell variable to limit the number of threads that a DO CONCURRENT program uses, unlike the options available for something like -Mconcur=allcores, which can be controlled by using OMP_NUM_THREADS or NCPUS.

Additionally, when running a program that uses DO CONCURRENT with MPI, it seems as though each MPI process is limited to using one thread and thus cannot use DO CONCURRENT to its advantage. Is there a way to allow each MPI process to use more than one core for the purposes of a DO CONCURRENT section?

Finally, what is the benefit of explicitly using DO CONCURRENT over compiling with -Mconcur? Thanks!

Hi mikk.sanborn,

I have not been able to find a compiler option nor runtime shell variable to limit the number of threads that a DO CONCURRENT program uses,

Our DO CONCURRENT implementation uses OpenACC “under-the-hood”, hence the environment variable to control the number of CPU threads is “ACC_NUM_CORES”.

Additionally, when running a program that uses DO CONCURRENT with MPI, it seems as though each MPI process is limited to using one thread and thus cannot use DO CONCURRENT to its advantage.

What flag did you use to compile the code?

The default when targeting CPU, i.e. “-stdpar=multicore”, would be to create a thread for every core on the system, unless ACC_NUM_CORES is set. Since you’re only seeing one thread, I’m wondering if you missed adding the flag, in which case DO CONNCURRENT is run serially, or you might have used “-stdpar” which targets GPUs.

Another possibility, is that all threads are being created, but the default OpenMPI binding, i.e. “–bind-to cores”, is binding them to the same core. When doing hybrid runs like this, I’ll typically change the binding to “–bind-to sockets” with ACC_NUM_CORES set to the number of cores per socket, or use a wrapper script with “–bind-to none” setting and then use numactl to perform the binding on a per rank basis.

Finally, what is the benefit of explicitly using DO CONCURRENT over compiling with -Mconcur ?

-Mconcur enables auto-parallelism, meaning the compiler will attempt to parallelize loop if it can prove independence. Though it may or may not be able to parallelize all loops. It’s also a feature of nvfortran so may not be available with other compilers.

To ensure parallelism and portability, you’d want to use explicit constructs such as Standard Language Parallelism or directive based approaches such as OpenMP or OpenACC.