OpenMP/OpenACC issue

Dear all,
I am trying to mix OpenMP and OpenACC, I have done something similar in the past and it was working pretty well. Nevertheless, when I try to compile the CAMB code using OpenMP, with or without OpenACC directives, thus adding one of the following flags combination : -mp -fopnemp; -mp ; -fopenmp; -mp=multicore -fopenmp ; -mp=multicore the code is crashing, specifically:

0x00007ffff61af615 in pgf90_extends_type_of_i8 () from /opt/nvidia/hpc_sdk/Linux_x86_64/25.1/compilers/lib/libnvf.so
(cuda-gdb) bt
#0 0x00007ffff61af615 in pgf90_extends_type_of_i8 ()
from /opt/nvidia/hpc_sdk/Linux_x86_64/25.1/compilers/lib/libnvf.so

Correspond to the following:

recfast.f90 line 522 select type(State)

If I comment out the OMP SECTION in the calling routine the error moves to some “this” reference apparently. Clearly the code works perfectly well using the GNU compiler and -fopenmp.

To reproduce the error is pretty easy using the following fork: GitHub - lstorchi/CAMB: Code for Anisotropies in the Microwave Background

$ git clone GitHub - lstorchi/CAMB: Code for Anisotropies in the Microwave Background
$ cd CAMB/
$ git checkout gpuport_omptest
$ git clone GitHub - lstorchi/forutils: Fortran 2008 utility functions and reusable classes
$ cd forutils/
$ git checkout gpuport
$ cd ../fortran/
$ make
$ ./camb params_low.ini

Thanks in advance

Hi Loriano,

While investigating this, I ended up finding 3 different ways the code seg faults. While I’m not 100% sure what’s going on, there seems to be something wrong in how the polymorphic types are getting set up in a few of the OpenMP regions. Though after commenting out a few of these regions, I was able to get it to run to completion.

I filed an issue report, TPR #37404, and sent it to engineering for review. I also tested your code with our next generation Fortran compiler, which better handle polymorphic types, and it work there. They are still working on adding OpenACC, so this was with OpenMP only. I only mention this since depending on the nature of the problem, there is a chance that engineering wont fix this in the current nvfortran and wait for the new compiler. Not to say that they wont, but I wanted to set your expectations.

The work around is to remove or comment out five OpenMP regions in the following files and line numbers:

results.F90: 1232, 2377, 2715, 2758
halofit.f90: 617

-Mat

Dear Mat
thanks indeed for your quick feedback . Maybe I will see how the code is performing removing the OMP regions you are reporting . When the new compiler will be released ? Thanks again , and please keep me posted in case th engineers will fix the issue

Unfortunately I can’t share that info just yet since it’s subject to change. Engineering is working very hard on it, so hopefully not too much longer.

please keep me posted in case th engineers will fix the issue

Will do

Hi Loriano,

FYI, when I first was looking at this I did encounter an issue when compiling with our development next gen nvfortran at “-O0 -g”. It aborted with the message:

fatal Fortran runtime error: Assign: mismatching element counts in array assignment (to 4, from 3)

Turns out to be an issue in your code where you do have a mismatch in an array assignments. For example, line 268 of openaccfunc.f90

            IVSource_q(i,:) =a0*ScaledSrcin(klo,:,i)+&

IVSource_q’s second dimension’s size is 4 while ScaledScrin is 3, leading to the mismatch.

I was able to work around it by setting “ThisSources%SourceNum=4” at line 1002 of cmbmain.f90, but don’t know if this is the correct approach.

The change doesn’t help with the errors with the current nvfortran, but wanted to let you know about the issue.

-Mat

Dear Mat
thanks indeed, yes the problem arise from the define IVSQCOLS that should be 4 instead of 3. In the meanwhile we are trying to find some workaround to deal with the OpenMP/OpenACC version. In the gpuport_omp branch removed the select type, and forced State to be CAMBdata. Thus something like:

503 #ifndef USEACC
504 class(TCAMBdata), target :: State
505 #else
506 class(CAMBdata), target :: State
507 endif
….
715 #ifndef USEACC
716 class default
717 call MpiStop(‘Wrong state type’)
718 end select
719 endif

but the problem seems to be that indeed there is a variable that should be a CAMBdata but instead is a TCAMBdata type:

(cuda-gdb) print this
$2 = ( tcambdata

#0 0x00000000004360cd in cambdata_timeofz (this=…, z=20.030856658965735, tol=0.001) at ../results.f90:1220

We are still trying to track the initial source of the problem .

Still clearly there is something more in depth as if I restore the original code :

state=<error reading variable: Cannot access memory at address 0xcb8>

so I guess in the multi threading environment there is some wrong data movement