My question is about OpenMP.I have a set of codes ,and the performance using pgi openMP is worse than intel’s . because our code has ten thousands of lines, it is impossilble to rewrite the code.we pay much attention to the performance.
I suspect for two reasons:
In the codes, we use many dynamic data structures that contain pointers, I think the PGI pointer processing efficiency may be worse than the intel, but I’m not sure,
Maybe I did not set pgi thread to the kernel binding well. I know there are two parameters of MB_BIND and MB_LIST to set pgi kernel binding. Each node of our machine has two cpu, each cpu has 6 cores, two CPUs are located in two different socket. Within a socket , I can set MB_LIST = 5,4,3,2,1,0 for a CPU, but for the two sockets,there are total 12 kernels.I set MB_LIST performance but the performance was decline.
Can you give me advices?
thanks for your patience for my poor English…
Your customer also sent this question to PGI Customer Service with several mails being sent back and forth. In reading the exchanges, they decided that the use of pointers wasn’t the problem since the serial speed was comparable. Also, they saw good performance when they use MP_BIND/MP_BLIST to bind to a single socket.
Their current follow question is how to bind to multiple sockets. The simple answer is that they just need to extend their MP_BLIST to include the additional cores, i.e. MP_BLIST=11,10,9,8,7,6,5,4,3,2,1,0.
For your edification, the optimal binding is very system specific. Different architectures will have different bindings, and different hardware vendors will order cores differently. Hence, users may need to do some research and experimentation to determine the best binding.
A useful utility is ‘numactl’ who’s “–hardware” option will give details on which memory node is attached to which cores. “numactl” also allows you to bind to cores as well as memory nodes (MP_BIND will bind to the closest memory node).