OpenACC for Multicore

Hello,

I have OpenACC+MPI numerical code. The pure MPI works well for the CPU computations, but I wanted to do some experiments: OpenACC for Multicore. I’m wondering why my code utilizes only two threads at max. I discovered it follows “Thread(s) per core: 2” in the lscpu info. If it has one thread per core, the max is limited to only one.

I read some other posts and documentation, and tested a couple of things:

  1. export ACC_NUM_CORES=16. → It does not follow. Setting 1 or 2 works for the CPU that has two threads

  2. Checked runtime threads in the acc loop through OpenMP “omp_get_thread_num()” → It’s got the desired number of threads as I set

  3. There is no “num_gangs” setting in the code.

Can I get your general thoughts on this issue? I can share my code or give you a reproducing example, whichever you prefer.

Thanks,
Yongsuk

Hi Yongsuk,

Likely it’s the MPI binding which defaults to binding the process to a core. Hence all the CPU threads get bound to this one core.

Can you try adding “–bind-to none” or “–bind-to socket” to your mpirun command?

-Mat

Hi Mat,

Thank you for your reply. It was the MPI binding issue. Problem solved!

Yongsuk