Segmentation fault (core dumped) on -stdpar=gpu for do concurrent codes

Hi! I have a numerical model written in “do concurrent” form. It runs fine with -stdpar=multicore. After -stdpar=gpu is used, “Segmentation fault (core dumped)” occurs. I print things at some break points in the code and finally find an array DZR(11) triggers the error. DZR is defined as below:

REAL, ALLOCATABLE :: DZR(:)
KB=11
KBM1=KB-1
ALLOCATE (DZR(KB))

BD is defined as below:

REAL BD(11)

The code fragment below prints error messages on screen:

      print*, " --- ADVU 4.3" !debug
      print*,"BD = ",BD
      print*,"DZR = ",DZR
      Do K=1,KBM1
          BD(K)=DZR(K)
      Enddo
      print*, " --- ADVU 4.4" !debug

The error messages are:

— ADVU 4.3
BD = 0.000000 0.000000 -3.3795852E-16 0.000000 -2.0697250E-17 0.000000 -1.7936702E-17 0.000000 0.000000 0.000000 0.000000
DZR = 10.00000 10.00000 9.999999 10.00000 10.00000 9.999998 10.00000 9.999998 10.00000 9.999998 0.000000
Segmentation fault (core dumped)

Compiler is nvfortran 24.7. GPU is A800.
What problem could this be? How to fix it? Thanks!

Hi Chenbr and welcome!

Unfortunately, there’s not enough information here to know what’s going on, so if possible, please provide a minimal reproducing example and I’ll take a look.

Now a seg fault typically occurs on the host and this is a regular DO loop, so my assumption is that this is running on the host.

If this was being executed on the device, then the problem is likely due to “BD”. Allocated arrays are put into CUDA Managed Memory so they are accessible on both the host and device. However static arrays, like “BD”, still need to be managed via OpenACC data directives. The exception being on a system with HMM (like Grace Hopper), in which case Unified Memory makes all memory visible.

Though since this appears to be running of the host, my best guess is that the value of “KBM1” is getting corrupted. The arrays themselves seem to be ok since they print, so the only thing left would be an out-bounds access, meaning KBM1’s value could be bad.

Again, a reproducing example would be useful since I can see everything in context and give you a better answer.

-Mat

1 Like

Hi Mat!

I am deeply sorry that I was making a terrible mistake in compiling this code. “-stdpar=gpu” was mistyped as “mp=gpu”. After this is fixed, the model is immediately correct and running fast!
In my model code, there are only “do concurrent” and NOT even one openmp directive. So that error might still be interesting, although not important anymore.
Before I found my mistake, I was trying to use “-stdpar=gpu -acc=gpu -gpu=nomanaged” and manually control the data movement by using “!$ACC ENTER DATA COPYIN()”, “!$ACC UPDATE HOST()”, “!$ACC UPDATE DEVICE()”. This version is not yet correct because it’s complex. Fortunately, when I compile this version, I found my “mp=gpu” mistake.

Thanks a lot for your answer!

No worries! If a segv happens when compiling with -mp, then one possible cause is a stack overflow. One side effect of enabling OpenMP is that automatics are allocated on the stack (i.e. the “-Mstack_arrays”) which can increase the needed stack size.

I personally always set my environment’s stack size to unlimited to avoid these, but you can also set the environment variable OMP_STACKSIZE to a large value.