The documentation of the PGI compiler doesn’t says realy
much about the new -mp=numa option.

Can someone please explain how a program can benefit from
this optimization?

best regards

Hi Alex,

More information about “-mp=numa” as well as NUMA (Non-uniform memory access) can be found in the PGI release notes.

Basically, “-mp=numa” links your application with the NUMA libraries. (See section 3.2.2 of the PGI release notes for a complete list of O.S. which support NUMA.) Using NUMA can improve performance of some parallel applications by reducing memory latency. Linking “-mp=numa” also allows you to use the environment variables “MP_BIND”, “MP_BLIST”,and “MP_SPIN”.

When “MP_BIND” is set to “yes”, parallel processes or threads are bound to a physical processor. This helps ensure that the kernel won’t move your process to a different CPU while it’s running.

Using “MP_BLIST”, you can specify exactly which processors to attach your process to. For example, if you have a Quad Dual-Core System (8 CPUS), you can set the blist so that the processes are interleaved across the 4 nodes (“MP_BLIST=2,4,6,0,1,3,5,7”) or bound to a particular node (“MP_BLIST=6,7”).

Threads at a barrier in a parallel region check a semaphore to determine if they can proceed. If the semaphore is not free after a certain number of tries, the thread gives up the processor (via sched_yield) for a while before checking again. The “MP_SPIN” variable defines the number of times a thread checks a semaphore before calling sched_yield. Setting MP_SPIN to -1 tells the thread to never call sched_yield. This can help performance but can waste CPU cycles that could be used by a different process if the thread spends a significant amount of time in a barrier.

Hope this helps,