Multi-Threaded computation with OpenMP

Hello,

I had created a very large program in Intel Visual Fortran, using OpenMP parallel directives. This program worked just fine, and I had verified that I can indeed use multiple threads.
I am now migrating the source code of this program into PGI Visual Fortran. The program builds fine, except that I never get multi-threaded computation. Instead, I always use a single thread out of the 32 available in my PC. I did enable OpenMP directives, I even tried giving a command line /Qopenmp, but nothing worked. I would be grateful if you could provide some insights on what may be wrong.
Again, this is a very large program, which I cannot share online, and which was working perfectly fine with Intel’s Compiler and OpenMP. It works fine with PGI VF, except that I can never get multi-thread execution. I am trying to run some demanding computations, and it takes forever if I use a single thread, so any help will be most welcome!

I wanted to add that I also tried a very small program with OpenMP, and it still does not work. I am attaching my code here (it includes a main routine, a module and a subroutine). The code is supposed to print on the screen the different numbers of the threads corresponding to each iteration of a parallel DO loop. While the Intel Visual Fortran build does print different thread numbers, 0-3 (I use 4 threads in this program), the PGI Visual Fortran build always prints a value of 0 for the thread, inside the parallel DO loop.

!===========================================
module arr1
implicit none
SAVE


real8,allocatable,dimension(:,:) :: array1


end module arr1
!================================================
program paral1
use arr1
use omp_lib
implicit none



integer i1,ith,NCPU,i2
real
8 sum


NCPU = OMP_GET_NUM_PROCS ( ) ! this command prints the total number of processors (both physical and virtual)

write(101,*) 'NCPU = ',NCPU

NCPU = 4
CALL OMP_SET_NUM_THREADS(4)

allocate(array1(100,NCPU))
array1 = 0.d0
sum=0.d0


!$OMP PARALLEL PRIVATE(ith)
!$OMP DO
do i1 = 1,100
ith = omp_get_thread_num ( )
write(,) i1,‘thread’, ith
call sub1(i1,ith+1)

!$OMP CRITICAL

do i2 = 1,NCPU
sum = sum + array1(i1,i2)
end do !i2


!$OMP END CRITICAL


end do !i1
!$OMP END DO
!$OMP END PARALLEL




do i1 = 1,100
write(101,1001) i1, (array1(i1,ith),ith=1,NCPU)
end do !i1

write(,) ‘sum=’,sum
write(,) ‘hit enter to complete execution’
read(,)


1001 format(i3,32(1x,e8.3))
end program paral1

!==================================================

subroutine sub1(iel,ithr)
use arr1
use omp_lib
implicit none

integer iel,ithr

array1(iel,ithr) = 1.d0


end subroutine sub1

Hi IKoutromanos,

By default the PGI native OpenMP runtime will use a single thread. You need to set “OMP_NUM_THREAD” in the environment to the number of threads to use or compile with “-mp=allcores” to change the default to use all cores on a system.

For details on using OMP_NUM_THREADS please see: https://www.pgroup.com/resources/docs/18.5/x86/pgi-user-guide/index.htm#openmp-env-vars

For “-mp=allcores” please see: https://www.pgroup.com/resources/docs/18.5/x86/pgi-ref-guide/index.htm#mp

Hope this helps,
Mat

Hi Mat,

I apologize, but my program includes the following command:

CALL OMP_SET_NUM_THREADS(4)

Shouldn’t this set the number of threads to 4 for the next parallel region?

Hi iKoutromanos,

Appoligies that I missed that you have the omp_set_num_threads call. Yes, that should work.

I tested it on my Windows system and it worked fine. Perhaps your not compiling with the “-mp” flag enabled?

Here’s my output:

PGI$ pgfortran -mp test.f90 -Minfo=mp
paral1:
     35, Parallel region activated
     37, Parallel loop activated with static block schedule
     44, Begin critical section (__cs_unspc)
     46, End critical section (__cs_unspc)
     53, Barrier
     54, Parallel region terminated
PGI$ ./test.exe
            1 thread            0
            2 thread            0
           76 thread            3
           77 thread            3
           78 thread            3
           79 thread            3
           80 thread            3
           81 thread            3
           82 thread            3
           83 thread            3
           84 thread            3
           85 thread            3
           86 thread            3
           87 thread            3
           88 thread            3
           89 thread            3
           90 thread            3
           91 thread            3
           92 thread            3
           93 thread            3
           94 thread            3
           95 thread            3
           96 thread            3
           97 thread            3
           98 thread            3
           99 thread            3
          100 thread            3
           26 thread            1
... cut ...
           20 thread            0
           21 thread            0
           22 thread            0
           23 thread            0
           24 thread            0
           25 thread            0
 sum=    100.0000000000000
 hit enter to complete execution

-Mat

No, this is not the problem. The “enable OpenMP directives” option is active. Could it be something else? Are there specific CPU models that PGI supports?


Incidentally, even your own output looks “funny”. It seems that the program almost exclusively uses thread 3, with very few iterations run by thread 0 and a single iteration run by thread 1.

Incidentally, even your own output looks “funny”. It seems that the program almost exclusively uses thread 3, with very few iterations run by thread 0 and a single iteration run by thread 1.

That was just because the output was long so I cut it. All four threads were used.

Are there specific CPU models that PGI supports?

We support all 64-bit x86 based systems and Power 8,9 CPUs. So I doubt that this is the problem.

Could it be something else?

Sorry, but I’m not sure what’s wrong. I doesn’t quite make sense.

Can you please post the compile flags used as seen from the project’s properties “Fortran->Command Line” top text box?

Thanks,
Mat

Mat,

Sorry, I did not notice that the output was truncated.

Here are my compile flags:

-Bstatic -Mbackslash -mp -I"c:\program files\pgi\win64\18.4\include" -I"C:\Program Files\PGI\Microsoft Open Tools 14\include" -I"C:\Program Files (x86)\Windows Kits\10\Include\shared" -I"C:\Program Files (x86)\Windows Kits\10\Include\um" -fast -Minform=warn

I wanted to mention (in case it matters) that I am currently using a trial version of the Professional edition, and that I also have installed PGI Community Edition in my PC.

I wanted to mention (in case it matters) that I am currently using a trial version of the Professional edition, and that I also have installed PGI Community Edition in my PC.

That shouldn’t matter. The compilers are fully featured with the CE and trial license.

You compiler options look correct so I’m still perplexed why it works fine for me but not for you. I’ve tried from both from the command line shell and from within PVF and both work as expected.

Let’s add the flag “-Minfo=mp” to the compile. This will have the compiler show which loops it’s parallelizing and we can double check that OpenMP code is being generated. You need to look at the build report to see the output in PVF.

What might also be helpful, is to build and run the code from a command line shell in case there’s some type of misconfiguration in PVF.

Are there any other OpenMP environment variables set in your environment? In particular, OMP_THREAD_LIMIT? If so, then this would limit the number of threads in use.

-Mat

Mat,

What do you mean by “build report”? Are you referring to the “build log”?
If yes, then I attach it here.

PVF Build Log

Begin rebuild: clean project

Deleting intermediate and output files for project 'Example_OpenMP', configuration 'Release'

Compiling Project ...

..\module1.for

c:\program files\pgi\win64\18.4\bin\pgfortran.exe -Hx,123,8 -Hx,123,0x40000 -Hx,0,0x40000000 -Mx,0,0x40000000 -Hx,0,0x20000000 -Bstatic -Mbackslash -mp -Mfixed -I"c:\program files\pgi\win64\18.4\include" -I"C:\Program Files\PGI\Microsoft Open Tools 14\include" -I"C:\Program Files (x86)\Windows Kits\10\Include\shared" -I"C:\Program Files (x86)\Windows Kits\10\Include\um" -fast -Minform=warn -module "x64\Release" -o "x64\Release\module1.obj" -Minfo=mp -c "E:\VT-MultiPhys\Example_OpenMP\module1.for"

Command exit code: 0


..\Source1.for

c:\program files\pgi\win64\18.4\bin\pgfortran.exe -Hx,123,8 -Hx,123,0x40000 -Hx,0,0x40000000 -Mx,0,0x40000000 -Hx,0,0x20000000 -Bstatic -Mbackslash -mp -Mfixed -I"c:\program files\pgi\win64\18.4\include" -I"C:\Program Files\PGI\Microsoft Open Tools 14\include" -I"C:\Program Files (x86)\Windows Kits\10\Include\shared" -I"C:\Program Files (x86)\Windows Kits\10\Include\um" -fast -Minform=warn -module "x64\Release" -o "x64\Release\Source1.obj" -Minfo=mp -c "E:\VT-MultiPhys\Example_OpenMP\Source1.for"

Command exit code: 0


..\sub1.for

c:\program files\pgi\win64\18.4\bin\pgfortran.exe -Hx,123,8 -Hx,123,0x40000 -Hx,0,0x40000000 -Mx,0,0x40000000 -Hx,0,0x20000000 -Bstatic -Mbackslash -mp -Mfixed -I"c:\program files\pgi\win64\18.4\include" -I"C:\Program Files\PGI\Microsoft Open Tools 14\include" -I"C:\Program Files (x86)\Windows Kits\10\Include\shared" -I"C:\Program Files (x86)\Windows Kits\10\Include\um" -fast -Minform=warn -module "x64\Release" -o "x64\Release\sub1.obj" -Minfo=mp -c "E:\VT-MultiPhys\Example_OpenMP\sub1.for"

Command exit code: 0


Linking...

c:\program files\pgi\win64\18.4\bin\pgfortran.exe -Wl,/libpath:"c:\program files\pgi\win64\18.4\lib" -Wl,/libpath:"C:\Program Files\PGI\Microsoft Open Tools 14\lib\amd64" -Wl,/libpath:"C:\Program Files (x86)\Windows Kits\10\Lib\winv6.3\um\x64" -Yl,"C:\Program Files\PGI\Microsoft Open Tools 14\bin\amd64" -Bstatic -mp -o "E:\VT-MultiPhys\Example_OpenMP\Example_OpenMP\x64\Release\Example_OpenMP.exe" "x64\Release\module1.obj" "x64\Release\Source1.obj" "x64\Release\sub1.obj"

Command exit code: 0

Example_OpenMP build succeeded.

=================================================================== \ \ I also built using the Command Line Shell, and it still has the same exact problem. That is, all iterations of the loop are executed by a single thread.

What do you mean by “build report”? Are you referring to the “build log”?

Yes, the build log.

In mine, I see the following output:

Command output: [paral1: 36, Parallel region activated 38, Parallel loop activated with static block schedule 45, Begin critical section (__cs_unspc) 47, End critical section (__cs_unspc) 54, Barrier 55, Parallel region terminated ]

But I don’t see a similar line in your build log. Meaning that the OpenMP code isn’t getting generated.

I do notice that your file has a “for” suffix meaning that Fixed format will be used. In this case, be sure that your “!$OMP” sentinels begin in the first column, else they wont be recognized. Or alternatively add “-Mfree” to use Free form.

-Mat

Mat,

You found the source of the problem! I was not aware that a fixed-form file requires placing the sentinels at the first column, and I was mistakenly having them at the 7th column. After placing the sentinels at the first column, the program runs normally, and the correct number of threads is used.

Thank you for all your help on this.

Yannis

Excellent! Glad it’s working.