New bug: Serious openMP overhead in sequential regions

Hi

After upgrading to pvf 11.6 i have noticed that there is now serious openMP overhead in sequential regions. I have made a piece of dummy code to illustrate the problem. I build it using the default release profile with OpenMP enabled … Just run the code and watch the cpu utilizing in taskmgr during execution.

Regards,

Casper

program prog
implicit none
integer :: i,NoThreads,N
real*8 :: x,y
N=1000000000
NoThreads=8
call omp_set_num_threads(NoThreads)
X=0
Y=0
print *,‘Start sequential loop. Consumes 4 cpus in taskmgr - not just 1!!???’
do i=1,N
X=X+1/i
Y=Y-1/i
end do
print *,X,Y
print *,‘Start parallel loop. Consuming 8 threads in taskmgr’
!$OMP PARALLEL
X=0
Y=0
!$OMP DO PRIVATE(i,x,y)
do i=1,N
X=X+1/i
Y=Y-1/i
end do
!$OMP END DO
print *,X,Y
!$OMP END PARALLEL
print *,‘done’
end program prog

Hi Casper,

This is the expected behavior and improves the performance of OpenMP programs. What’s happening is at the first use of OpenMP (either entering a region or making a call to the OpenMP runtime as is the case here), our runtime will spawn the threads.

By default, the threads use the “ACTIVE” wait policy (set via the environment variable OMP_WAIT_POLICY). Here the threads spins on a semaphore checking if it’s needed. After a certain number of tries (defined by the environment variable MP_SPIN with the default of 1000000), the thread calls"sched_yield" on Linux or “_sleep” on Windows effectively having the OS to swap them out if another process needs the core. If no other process is running, the OS will wake the thread up and the process repeats.

Unlike other methods which create and destroy threads at every parallel region, this method vastly reduces the over head of parallel regions and helps with overall performance.

If you are still concerned, there are a few things you can do. One is to simply move the call to “omp_set_num_threads” after your serial loop. This will cause the thread creation to be delayed. You can also set the OMP_WAIT_POLICY to PASSIVE in which case the threads sleep while not in use.

Hope this helps,
Mat

Ahh, I see …

Is there any way to set the wait policy from within the code? Unfortunately, I cannot expect the user to start setting up environment variables …

Regards,

Casper