Strange Error Message

I am compiling a code which has the following lines.

 
         !$acc parallel loop &
          !$acc present(ugrad_sol_tmp, mesh%xr, mesh%xs, mesh%yr, mesh%ys) 
          do k = 1, chunksize
              do i = 1, Nvar
                  do j = 1, Np
                      ur_solution(j, i) = ugrad_sol_tmp(j,  (k-1)*Nvar+i)
                      us_solution(j, i) = ugrad_sol_tmp(Np+j, (k-1)*Nvar+i)
                  end do
              end do

              do i = 1, Np
                  Js(i) = ONE/( mesh%xr(i, (index-1)*chunksize+k)*mesh%ys(i, (index-1)*chunksize+k) &
                      - mesh%xs(i, (index-1)*chunksize+k)*mesh%yr(i, (index-1)*chunksize+k) )
                  rx_solution(i) =  mesh%ys(i, (index-1)*chunksize+k)*Js(i)
                  ry_solution(i) = -mesh%xs(i, (index-1)*chunksize+k)*Js(i)
                  sx_solution(i) = -mesh%yr(i, (index-1)*chunksize+k)*Js(i)
                  sy_solution(i) =  mesh%xr(i, (index-1)*chunksize+k)*Js(i)
              end do

              do var = 1, Nvar
                  do i = 1, Np
                      mesh%ux(i, var, (index-1)*chunksize+k) = rx_solution(i)*ur_solution(i, var) &
                          + sx_solution(i)*us_solution(i, var)
                      mesh%uy(i, var, (index-1)*chunksize+k) = ry_solution(i)*ur_solution(i, var) &
                          + sy_solution(i)*us_solution(i, var)
                  end do
              end do
          end do
          !$acc end parallel loop

This compiles and runs. Now all I do is take the last loop out and put in a separate loop.

          !$acc parallel loop &
          !$acc present(ugrad_sol_tmp, mesh%xr, mesh%xs, mesh%yr, mesh%ys) 
          do k = 1, chunksize
              do i = 1, Nvar
                  do j = 1, Np
                      ur_solution(j, i) = ugrad_sol_tmp(j,  (k-1)*Nvar+i)
                      us_solution(j, i) = ugrad_sol_tmp(Np+j, (k-1)*Nvar+i)
                  end do
              end do

              do i = 1, Np
                  Js(i) = ONE/( mesh%xr(i, (index-1)*chunksize+k)*mesh%ys(i, (index-1)*chunksize+k) &
                      - mesh%xs(i, (index-1)*chunksize+k)*mesh%yr(i, (index-1)*chunksize+k) )
                  rx_solution(i) =  mesh%ys(i, (index-1)*chunksize+k)*Js(i)
                  ry_solution(i) = -mesh%xs(i, (index-1)*chunksize+k)*Js(i)
                  sx_solution(i) = -mesh%yr(i, (index-1)*chunksize+k)*Js(i)
                  sy_solution(i) =  mesh%xr(i, (index-1)*chunksize+k)*Js(i)
              end do

          end do
          !$acc end parallel loop 

          !$acc parallel loop 
          do k = 1, chunksize
              do var = 1, Nvar
                  do i = 1, Np
                      mesh%ux(i, var, (index-1)*chunksize+k) = rx_solution(i)*ur_solution(i, var) &
                          + sx_solution(i)*us_solution(i, var)
                      mesh%uy(i, var, (index-1)*chunksize+k) = ry_solution(i)*ur_solution(i, var) &
                          + sy_solution(i)*us_solution(i, var)
                  end do
              end do
          end do
          !$acc end parallel

and I get the following error message.

pgfortran-Fatal-/usr/local/pgi/linux86-64/15.10/bin/pgf902 TERMINATED by signal 11
Arguments to /usr/local/pgi/linux86-64/15.10/bin/pgf902
/usr/local/pgi/linux86-64/15.10/bin/pgf902 /tmp/pgfortranoBTQNPQHRY4.ilm -fn /examples/CNS/shocktube/../../..//src/libfr.f90 -debug -x 120 0x200 -x 123 0x400 -opt 2 -terse 1 -inform warn -x 51 0x20 -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 117 0x1000 -quad -vect 56 -y 34 16 -x 34 0x8 -x 32 8388608 -y 19 8 -y 35 0 -x 42 0x30 -x 39 0x40 -x 199 10 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 59 4 -tp haswell -x 124 0x1400 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -x 49 0x100 -x 120 0x200 -astype 0 -x 121 1 -x 70 0x40000000 -x 124 1 -accel tesla -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 121 0xc00 -x 194 0x40000 -x 163 0x1 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -cudaver 7.5 -x 121 0xc00 -x 194 0x40000 -x 176 0x100 -cudacap 30 -x 189 0x8000 -y 163 0xc0000000 -x 163 0x800000 -x 189 0x10 -y 189 0x4000000 -x 198 0x40000 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -x 9 1 -x 72 0x1 -x 136 0x11 -quad -x 119 0x10000000 -x 129 0x40000000 -x 164 0x1000 -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 16384 -x 162 16384 -cmdline '+pgfortran /examples/CNS/shocktube/../../..//src/libfr.f90 -module t/Linux.pgi.atlas/m -It/Linux.pgi.atlas/m -fast -Mvect=sse -Mcache_align -Mflushz -Mpre -gopt -O2 -Mvect=sse -Mcache_align -Mpre -acc -Minfo=accel -ta=tesla:cc30 -I/usr/local/SILO/include -I/usr/local/hdf5-pgi/include -Mpreprocess -DNOCUDA -c -o t/Linux.pgi.atlas/o/libfr.o' -asm /tmp/pgfortranoBTQyQW8yXE.sm

I don’t understand it.

Hi Vsignh,

This is a compiler error. Can you please send a reproducing example to PGI Customer Service (trs@pgroup.com)? We can test to see if it’s been fixed already and if not, will report it to our engineers.

Thanks,
Mat

Hi Mat,

Thanks for the response.

I think the mail id is wrong. I have sent the mail to trs@pgroup.com

Yes, typo. I’ll correct the post.

Hi Mat,

Any updates on the problem.

Thanks.

Hi Vsingh,

Do you have a TPR# I can look up? I don’t see any error reports from you in our system nor any mail from you in our trs mail. However, I just may not be looking from the correct keywords and the customer service folks may have archived your mail.

  • Mat

Hi Mat,

I had sent the mail on May 17. I will resend it and reproduce it in this message as well.

Thanks.

Hi,

This is with reference to this thread on the forum.

https://forums.developer.nvidia.com/t/strange-error-message/134925/1

The code is now up at

https://bitbucket.org/vsingh001/deepfry/src/267342403a1b?at=gpu

Please go to the folder

/examples/CNS/shocktube

and do

make COMP=pgi

You will get the error message.

I am using version 15.10 of the compiler

Regards,
Vikram

Hi Vikram,

I was not able to reproduce the error with 15.10. However, I do see an error out of Valgrind which I’ve reported to engineering as TPR#22582. These types of memory errors may “work” on one system but fail on another, and would explain why we haven’t seen it before.

Note that I had problems compiling your code with later compilers as well as when compiling with -Mcuda due to interface issues with how you’re calling cublasDGEMM. I did not investigate these and just commented them out for now.

  • Mat

Hi Mat,

Thanks.

Unfortunately valgrind will give an error because the values generated at the end of this subroutine must be updated for subsequent subroutines to work. And they are not updated without the DGEMM calls.

Anyway, I rechecked and it seems when I comment out the cublasDGEMM calls they disappear. Is it related to cublasDGEMM?

Apologies that I wasn’t clear. With 15.10 I was able to compile the file with the calls to cublasDGEMM. In this case the compiler did not crash but I did see Valgrind errors with the compiler (not your program). It’s these Valgrind errors that I reported.

The interface problem occurred when I either added “-Mcuda” to the 15.10 compilation or when moving to PGI 16.1. I have not investigate why the interface issue occurs.

  • Mat

This has been fixed with release 19.7