PGI compile Nekbone cuda-openacc branch failled

I am trying to compile Nekbone, a miniapp having its CUDA version.
According to its user guide, they verified to compile Nekbone with PGI.
However, I encountered errors like:

/usr/include/c++/5/bits/basic_string.h", line 5228: error: basic_string is not a template

May I ask for any suggestion how to get rid of this compile error?
Thanks in advance!

Hi Bambo,

Given that this is a C++ error and NekBone doesn’t contain any CUDA C or C++ files, I’m not sure where the error is coming from nor why it would occur. Can please you detail the steps you did to get to this point so I can try to recreate it? Also, can you post the full compilation line which causes the error?

I just tried by cloning the Nekbone repository, checking out “cuda-openacc”, going to the “test/nke_gpu1” directory, and running “makenek.cuda”. (I also modified makenek.cuda to use “cc70” since I’m using a Volta GPU). This built fine for me.

-Mat

Hi Mat,

Thank you for your reply!

Here are what I did:

  1. I cloned Nekbone repository, switched to “cuda-openacc” branch, then did some modifications according to my system.
diff --git a/src/makefile.template b/src/makefile.template
index 7dc50fc..dfb9ff0 100755
--- a/src/makefile.template
+++ b/src/makefile.template
@@ -88,16 +88,16 @@ L2=$(G) -O2
 L3=$(G) -O3
 L4=$(L3)
 
-FL0   = $(L0) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-FL2i4 = $(L0)      $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-FL2   = $(L2) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-FL3   = $(L3) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-FL4   = $(L4) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-
-cFL0   = $(L0) $(PPS_C) 
-cFL2   = $(L2) $(PPS_C) 
-cFL3   = $(L3) $(PPS_C) 
-cFL4   = $(L4) $(PPS_C) 
+FL0   = $(L0) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+FL2i4 = $(L0)      $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+FL2   = $(L2) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+FL3   = $(L3) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+FL4   = $(L4) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+
+cFL0   = $(L0) $(PPS_C) -std=c++11 -I/vol/.openmpi/include
+cFL2   = $(L2) $(PPS_C) -std=c++11 -I/vol/.openmpi/include
+cFL3   = $(L3) $(PPS_C) -std=c++11 -I/vol/.openmpi/include
+cFL4   = $(L4) $(PPS_C) -std=c++11 -I/vol/.openmpi/include
 ################################################################################
 all : nekbone
 
diff --git a/test/nek_gpu1/makenek.cuda b/test/nek_gpu1/makenek.cuda
index 6545794..0d3e77e 100755
--- a/test/nek_gpu1/makenek.cuda
+++ b/test/nek_gpu1/makenek.cuda
@@ -7,10 +7,12 @@
 SOURCE_ROOT="../../src"
 
 # Fortran compiler
-F77="mpif90"
+#F77="mpif90"
+F77="pgfortran"
 
 # C compiler
-CC="mpicc"
+#CC="mpicc"
+CC="pgc++"
 
 # pre-processor symbol list 
 # (set PPLIST=? to get a list of available symbols)
@@ -30,10 +32,12 @@ CC="mpicc"
 #USR="foo.o"
 
 # linking flags
-USR_LFLAGS="-Mcuda=cc50,cc60 -ta=nvidia:cc50,cc60"
+#USR_LFLAGS="-Mcuda=cc50,cc60 -ta=nvidia:cc50,cc60"
+USR_LFLAGS="-Mcuda=cc70 -ta=nvidia:cc70"
 
 # generic compiler flags
-G="-acc -Minfo=accel -Mcuda=cc50,cc60 -ta=nvidia:cc50,cc60"
+#G="-acc -Minfo=accel -Mcuda=cc50,cc60 -ta=nvidia:cc50,cc60"
+G="-acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70"
 
 # optimization flags
 #OPT_FLAGS_STD=""
  1. Then under the the folder, test/nek_gpu1, executed the script, maknedk.cuda.
    It gave me the following output:
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O3 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/cg.f -o obj/cg.o
ax_acc:
    421, Generating present(ut(:,:,:,:),gxyz(:,:,:,:,:),dxtm1(:,:),u(:,:,:,:),ur(:,:,:,:),us(:,:,:,:),wk(:,:,:,:),w(:,:,:,:),dxm1(:,:))
cg_acc:
    570, Generating present(wk(:,:,:,:),c(:),p(:,:,:,:),ut(:,:,:,:),g(:,:),z(:),x(:),r(:),ur(:,:,:,:),us(:,:,:,:),w(:))
    574, Generating update device(cmask(:),c(:),r(:),p(:,:,:,:))
maskit_acc:
    636, Generating present(w(:),pmask(:))
    640, Accelerator kernel generated
         Generating Tesla code
        641, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O2 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/driver.f -o obj/driver.o
nekbone:
     54, Generating create(wk(:,:,:,:),dxtm1(:,:),p(:,:,:,:),ut(:,:,:,:),ids_ptr(:),z(:),x(:),ug(:),ur(:,:,:,:),us(:,:,:,:),w(:),r(:),f(:),cmask(:),dxm1(:,:),c(:),g(:,:),ids_lgl1(:),ids_lgl2(:))
     66, Generating update device(ids_lgl2(:),ids_lgl1(:),ids_ptr(:))
     73, Generating update device(dxm1(:,:),g(:,:),dxtm1(:,:))
set_multiplicity_acc:
    583, Generating present(c(:))
    589, Accelerator kernel generated
         Generating Tesla code
        590, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
set_f_acc:
    603, Generating present(c(:),f(:))
    604, Accelerator kernel generated
         Generating Tesla code
        605, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O3 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/math.f -o obj/math.o
rzero_acc:
   1391, Generating present(a(:n))
   1392, Accelerator kernel generated
         Generating Tesla code
       1393, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
rone_acc:
   1405, Generating present(a(:n))
   1406, Accelerator kernel generated
         Generating Tesla code
       1407, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
copy_acc:
   1419, Generating present(a(:n),b(:n))
   1420, Accelerator kernel generated
         Generating Tesla code
       1421, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
glsc3_acc:
   1439, Generating present(a(:n),b(:n),mult(:n))
   1440, Accelerator kernel generated
         Generating Tesla code
       1440, Generating reduction(+:tmp)
       1441, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
   1440, Generating implicit copy(tmp)
add2s1_acc:
   1455, Generating present(a(:n),b(:n))
   1456, Accelerator kernel generated
         Generating Tesla code
       1457, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
add2s2_acc:
   1469, Generating present(a(:n),b(:n))
   1470, Accelerator kernel generated
         Generating Tesla code
       1471, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
vlsum_acc:
   1486, Generating present(vec(:n))
   1487, Accelerator kernel generated
         Generating Tesla code
       1487, Generating reduction(+:sum)
       1488, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
   1487, Generating implicit copy(sum)
col2_acc:
   1504, Generating present(a(:),b(:))
   1506, Accelerator kernel generated
         Generating Tesla code
       1507, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O2 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/mxm_wrapper.f -o obj/mxm_wrapper.o 
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O2 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/prox_dssum.f -o obj/prox_dssum.o
PGF90-W-0119-Redundant specification for ug (/vol/Nekbone/src/prox_dssum.f: 438)
  0 inform,   1 warnings,   0 severes, 0 fatal for dssum_acc
dssum_acc:
    450, Generating present(ids_lgl1(:),ids_ptr(:nglobl+1),u(:),ug(:nglobl))
    454, Accelerator kernel generated
         Generating Tesla code
        455, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
        459, !$acc loop seq
    459, Complex loop carried dependence of ug prevents parallelization
         Loop carried reuse of ug prevents parallelization
    469, Generating update self(ug(:n_nonlocal))
    473, Generating update device(ug(:n_nonlocal))
    476, Accelerator kernel generated
         Generating Tesla code
        477, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
        480, !$acc loop seq
    480, Loop carried reuse of u prevents parallelization
dssum2_acc:
    570, Generating present(u(:))
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O3 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/prox_setup.f -o obj/prox_setup.o
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O3 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/semhat.f -o obj/semhat.o
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O2 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/speclib.f -o obj/speclib.o
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O2 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/delay_dum.f -o obj/delay_dum.o
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O2 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/hsmg_dum.f -o obj/hsmg_dum.o
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O3 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/ax_cuda.f -o obj/ax_cuda.o
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O2 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/comm_mpi.f -o obj/comm_mpi.o
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O3 -r8 -Mpreprocess -Mfixed -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/mxm_std.f -o obj/mxm_std.o
pgfortran -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O0      -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -I/vol/Nekbone/test/nek_gpu1 -I/vol/Nekbone/src -I./ -I/vol/.openmpi/include /vol/Nekbone/src/blas.f -o obj/blas.o
pgc++ -c -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -O2 -DPTRSIZE8 -DMPI -DLONGINT8 -DUNDERSCORE -DGLOBAL_LONG_LONG -std=c++11 -I/vol/.openmpi/include -DPREFIX=jl_ /vol/Nekbone/src/jl/gs.c -o obj/jl_gs.o
"/usr/include/c++/5/iosfwd", line 147: error: basic_stringbuf is not a template
    typedef basic_stringbuf<char> 	stringbuf;
            ^

"/usr/include/c++/5/iosfwd", line 150: error: basic_istringstream is not a
          template
    typedef basic_istringstream<char> 	istringstream;
            ^

"/usr/include/c++/5/iosfwd", line 153: error: basic_ostringstream is not a
          template
    typedef basic_ostringstream<char> 	ostringstream;
            ^

"/usr/include/c++/5/iosfwd", line 156: error: basic_stringstream is not a
          template
    typedef basic_stringstream<char> 	stringstream;
            ^

"/usr/include/c++/5/iosfwd", line 187: error: basic_stringbuf is not a template
    typedef basic_stringbuf<wchar_t> 	wstringbuf;
            ^

"/usr/include/c++/5/iosfwd", line 190: error: basic_istringstream is not a
          template
    typedef basic_istringstream<wchar_t> 	wistringstream;
            ^

"/usr/include/c++/5/iosfwd", line 193: error: basic_ostringstream is not a
          template
    typedef basic_ostringstream<wchar_t> 	wostringstream;
            ^

"/usr/include/c++/5/iosfwd", line 196: error: basic_stringstream is not a
          template
    typedef basic_stringstream<wchar_t> 	wstringstream;
            ^

"/usr/include/c++/5/bits/basic_string.h", line 4782: error: basic_string is not
          a template
      basic_string<_CharT, _Traits, _Alloc>
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4783: error: basic_string is not
          a template
      operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                      ^

"/usr/include/c++/5/bits/basic_string.h", line 4784: error: basic_string is not
          a template
  	      const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	            ^

"/usr/include/c++/5/bits/basic_string.h", line 4783: error: nonmember operator
          requires a parameter with class or enum type
      operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4798: error: basic_string is not
          a template
      basic_string<_CharT,_Traits,_Alloc>
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4800: error: basic_string is not
          a template
  	      const basic_string<_CharT,_Traits,_Alloc>& __rhs);
  	            ^

"/usr/include/c++/5/bits/basic_string.h", line 4799: error: nonmember operator
          requires a parameter with class or enum type
      operator+(const _CharT* __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4809: error: basic_string is not
          a template
      basic_string<_CharT,_Traits,_Alloc>
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4810: error: basic_string is not
          a template
      operator+(_CharT __lhs, const basic_string<_CharT,_Traits,_Alloc>& __rhs);
                                    ^

"/usr/include/c++/5/bits/basic_string.h", line 4819: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4820: error: basic_string is not
          a template
      operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                      ^

"/usr/include/c++/5/bits/basic_string.h", line 4820: error: nonmember operator
          requires a parameter with class or enum type
      operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4835: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4836: error: basic_string is not
          a template
      operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs, _CharT __rhs)
                      ^

"/usr/include/c++/5/bits/basic_string.h", line 4847: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4848: error: basic_string is not
          a template
      operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
                ^

"/usr/include/c++/5/bits/basic_string.h", line 4849: error: basic_string is not
          a template
  	      const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	            ^

"/usr/include/c++/5/bits/basic_string.h", line 4848: error: nonmember operator
          requires a parameter with class or enum type
      operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4853: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4854: error: basic_string is not
          a template
      operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                      ^

"/usr/include/c++/5/bits/basic_string.h", line 4855: error: basic_string is not
          a template
  	      basic_string<_CharT, _Traits, _Alloc>&& __rhs)
  	      ^

"/usr/include/c++/5/bits/basic_string.h", line 4854: error: nonmember operator
          requires a parameter with class or enum type
      operator+(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4859: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4860: error: basic_string is not
          a template
      operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
                ^

"/usr/include/c++/5/bits/basic_string.h", line 4861: error: basic_string is not
          a template
  	      basic_string<_CharT, _Traits, _Alloc>&& __rhs)
  	      ^

"/usr/include/c++/5/bits/basic_string.h", line 4860: error: nonmember operator
          requires a parameter with class or enum type
      operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4871: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4873: error: basic_string is not
          a template
  	      basic_string<_CharT, _Traits, _Alloc>&& __rhs)
  	      ^

"/usr/include/c++/5/bits/basic_string.h", line 4872: error: nonmember operator
          requires a parameter with class or enum type
      operator+(const _CharT* __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4877: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4879: error: basic_string is not
          a template
  	      basic_string<_CharT, _Traits, _Alloc>&& __rhs)
  	      ^

"/usr/include/c++/5/bits/basic_string.h", line 4883: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4884: error: basic_string is not
          a template
      operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
                ^

"/usr/include/c++/5/bits/basic_string.h", line 4884: error: nonmember operator
          requires a parameter with class or enum type
      operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4889: error: basic_string is not
          a template
      inline basic_string<_CharT, _Traits, _Alloc>
             ^

"/usr/include/c++/5/bits/basic_string.h", line 4890: error: basic_string is not
          a template
      operator+(basic_string<_CharT, _Traits, _Alloc>&& __lhs,
                ^

"/usr/include/c++/5/bits/basic_string.h", line 4904: error: basic_string is not
          a template
      operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 4905: error: basic_string is not
          a template
  	       const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 4904: error: nonmember operator
          requires a parameter with class or enum type
      operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4911: error: basic_string is not
          a template
      operator==(const basic_string<_CharT>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 4912: error: basic_string is not
          a template
  	       const basic_string<_CharT>& __rhs)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 4911: error: nonmember operator
          requires a parameter with class or enum type
      operator==(const basic_string<_CharT>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4926: error: basic_string is not
          a template
  	       const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 4925: error: nonmember operator
          requires a parameter with class or enum type
      operator==(const _CharT* __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4937: error: basic_string is not
          a template
      operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 4937: error: nonmember operator
          requires a parameter with class or enum type
      operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4950: error: basic_string is not
          a template
      operator!=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 4951: error: basic_string is not
          a template
  	       const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 4950: error: nonmember operator
          requires a parameter with class or enum type
      operator!=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4963: error: basic_string is not
          a template
  	       const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 4962: error: nonmember operator
          requires a parameter with class or enum type
      operator!=(const _CharT* __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4974: error: basic_string is not
          a template
      operator!=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 4974: error: nonmember operator
          requires a parameter with class or enum type
      operator!=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4987: error: basic_string is not
          a template
      operator<(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                      ^

"/usr/include/c++/5/bits/basic_string.h", line 4988: error: basic_string is not
          a template
  	      const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	            ^

"/usr/include/c++/5/bits/basic_string.h", line 4987: error: nonmember operator
          requires a parameter with class or enum type
      operator<(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 4999: error: basic_string is not
          a template
      operator<(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                      ^

"/usr/include/c++/5/bits/basic_string.h", line 4999: error: nonmember operator
          requires a parameter with class or enum type
      operator<(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5012: error: basic_string is not
          a template
  	      const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	            ^

"/usr/include/c++/5/bits/basic_string.h", line 5011: error: nonmember operator
          requires a parameter with class or enum type
      operator<(const _CharT* __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5024: error: basic_string is not
          a template
      operator>(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                      ^

"/usr/include/c++/5/bits/basic_string.h", line 5025: error: basic_string is not
          a template
  	      const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	            ^

"/usr/include/c++/5/bits/basic_string.h", line 5024: error: nonmember operator
          requires a parameter with class or enum type
      operator>(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5036: error: basic_string is not
          a template
      operator>(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                      ^

"/usr/include/c++/5/bits/basic_string.h", line 5036: error: nonmember operator
          requires a parameter with class or enum type
      operator>(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5049: error: basic_string is not
          a template
  	      const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	            ^

"/usr/include/c++/5/bits/basic_string.h", line 5048: error: nonmember operator
          requires a parameter with class or enum type
      operator>(const _CharT* __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5061: error: basic_string is not
          a template
      operator<=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 5062: error: basic_string is not
          a template
  	       const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 5061: error: nonmember operator
          requires a parameter with class or enum type
      operator<=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5073: error: basic_string is not
          a template
      operator<=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 5073: error: nonmember operator
          requires a parameter with class or enum type
      operator<=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5086: error: basic_string is not
          a template
  	       const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 5085: error: nonmember operator
          requires a parameter with class or enum type
      operator<=(const _CharT* __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5098: error: basic_string is not
          a template
      operator>=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 5099: error: basic_string is not
          a template
  	       const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 5098: error: nonmember operator
          requires a parameter with class or enum type
      operator>=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5110: error: basic_string is not
          a template
      operator>=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
                       ^

"/usr/include/c++/5/bits/basic_string.h", line 5110: error: nonmember operator
          requires a parameter with class or enum type
      operator>=(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5123: error: basic_string is not
          a template
  	     const basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	           ^

"/usr/include/c++/5/bits/basic_string.h", line 5122: error: nonmember operator
          requires a parameter with class or enum type
      operator>=(const _CharT* __lhs,
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5135: error: basic_string is not
          a template
      swap(basic_string<_CharT, _Traits, _Alloc>& __lhs,
           ^

"/usr/include/c++/5/bits/basic_string.h", line 5136: error: basic_string is not
          a template
  	 basic_string<_CharT, _Traits, _Alloc>& __rhs)
  	 ^

"/usr/include/c++/5/bits/basic_string.h", line 5155: error: basic_string is not
          a template
  	       basic_string<_CharT, _Traits, _Alloc>& __str);
  	       ^

"/usr/include/c++/5/bits/basic_string.h", line 5159: error: basic_string is not
          a template
      operator>>(basic_istream<char>& __is, basic_string<char>& __str);
                                            ^

"/usr/include/c++/5/bits/basic_string.h", line 5159: error: no instance of
          function template "std::operator>>" matches the specified type
      operator>>(basic_istream<char>& __is, basic_string<char>& __str);
      ^

"/usr/include/c++/5/bits/basic_string.h", line 5173: error: basic_string is not
          a template
  	       const basic_string<_CharT, _Traits, _Alloc>& __str)
  	             ^

"/usr/include/c++/5/bits/basic_string.h", line 5196: error: basic_string is not
          a template
  	    basic_string<_CharT, _Traits, _Alloc>& __str, _CharT __delim);
  	    ^

"/usr/include/c++/5/bits/basic_string.h", line 5213: error: basic_string is not
          a template
  	    basic_string<_CharT, _Traits, _Alloc>& __str)
  	    ^

"/usr/include/c++/5/bits/basic_string.h", line 5221: error: basic_string is not
          a template
  	    basic_string<_CharT, _Traits, _Alloc>& __str, _CharT __delim)
  	    ^

"/usr/include/c++/5/bits/basic_string.h", line 5228: error: basic_string is not
          a template
  	    basic_string<_CharT, _Traits, _Alloc>& __str)
  	    ^

"/usr/include/c++/5/bits/basic_string.h", line 5234: error: basic_string is not
          a template
      getline(basic_istream<char>& __in, basic_string<char>& __str,
                                         ^

Error limit reached.
100 errors detected in the compilation of "/vol/Nekbone/src/jl/gs.c".
Compilation terminated.
makefile:166: recipe for target 'obj/jl_gs.o' failed
make: *** [obj/jl_gs.o] Error 2

I am looking forward to hearing some suggestions from you.
Thanks in advance!

Bambo

Hi Bambo,

While I see a bit different error when trying to compile the “gs.c” file with a C++ compiler, I do see errors. The problem looks to be that the C source is expecting the language to be C99. Can you try using adding the flag “-std=c99” to the pgc++ compilation or use our C compiler (pgcc)?

-Mat

Hi Mat,

Thank you so much!
Changing to use pgcc does help.
Now I ran into the last step, link all object files to be the nekbone executable.
However, it stops here and reports some mpi functions are undefined.

pgfortran -o nekbone -acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -L/vol/.openmpi/lib -lmpi obj/cg.o obj/driver.o obj/math.o obj/mxm_wrapper.o obj/prox_dssum.o obj/prox_setup.o obj/semhat.o obj/speclib.o obj/delay_dum.o obj/hsmg_dum.o obj/ax_cuda.o obj/comm_mpi.o obj/mxm_std.o obj/blas.o obj/jl_gs.o obj/jl_sort.o obj/jl_sarray_transfer.o obj/jl_sarray_sort.o obj/jl_gs_local.o obj/jl_crystal.o obj/jl_comm.o obj/jl_tensor.o obj/jl_fail.o obj/jl_fcrystal.o obj/jl_sleep.o -Mcuda=cc70 -ta=nvidia:cc70
obj/comm_mpi.o: In function `iniproc_':
/vol/Nekbone/src/comm_mpi.f:11: undefined reference to `mpi_initialized_'
/vol/Nekbone/src/comm_mpi.f:13: undefined reference to `mpi_init_'
/vol/Nekbone/src/comm_mpi.f:21: undefined reference to `mpi_attr_get_'
obj/comm_mpi.o: In function `gop_':
/vol/Nekbone/src/comm_mpi.f:107: undefined reference to `mpi_allreduce_'
/vol/Nekbone/src/comm_mpi.f:109: undefined reference to `mpi_allreduce_'
/vol/Nekbone/src/comm_mpi.f:111: undefined reference to `mpi_allreduce_'
/vol/Nekbone/src/comm_mpi.f:113: undefined reference to `mpi_allreduce_'
obj/comm_mpi.o: In function `igop_':
/vol/Nekbone/src/comm_mpi.f:136: undefined reference to `mpi_allreduce_'
obj/comm_mpi.o:/vol/Nekbone/src/comm_mpi.f:138: more undefined references to `mpi_allreduce_' follow
obj/comm_mpi.o: In function `csend_':
/vol/Nekbone/src/comm_mpi.f:187: undefined reference to `mpi_send_'
obj/comm_mpi.o: In function `crecv_':
/vol/Nekbone/src/comm_mpi.f:198: undefined reference to `mpi_recv_'
obj/comm_mpi.o: In function `crecv3_':
/vol/Nekbone/src/comm_mpi.f:219: undefined reference to `mpi_recv_'
/vol/Nekbone/src/comm_mpi.f:219: undefined reference to `mpi_get_count_'
obj/comm_mpi.o: In function `numnodes_':
/vol/Nekbone/src/comm_mpi.f:239: undefined reference to `mpi_comm_size_'
obj/comm_mpi.o: In function `mynode_':
/vol/Nekbone/src/comm_mpi.f:249: undefined reference to `mpi_comm_rank_'
obj/comm_mpi.o: In function `dnekclock_':
/vol/Nekbone/src/comm_mpi.f:258: undefined reference to `mpi_wtime_'
obj/comm_mpi.o: In function `dnekclock_sync_':
/vol/Nekbone/src/comm_mpi.f:266: undefined reference to `mpi_wtime_'
obj/comm_mpi.o: In function `bcast_':
/vol/Nekbone/src/comm_mpi.f:299: undefined reference to `mpi_bcast_'
obj/comm_mpi.o: In function `create_comm_':
/vol/Nekbone/src/comm_mpi.f:313: undefined reference to `mpi_comm_dup_'
obj/comm_mpi.o: In function `isend_':
/vol/Nekbone/src/comm_mpi.f:329: undefined reference to `mpi_isend_'
obj/comm_mpi.o: In function `irecv_':
/vol/Nekbone/src/comm_mpi.f:347: undefined reference to `mpi_irecv_'
obj/comm_mpi.o: In function `msgwait_':
/vol/Nekbone/src/comm_mpi.f:365: undefined reference to `mpi_wait_'
obj/comm_mpi.o: In function `nekgsync_':
/vol/Nekbone/src/comm_mpi.f:376: undefined reference to `mpi_barrier_'
obj/comm_mpi.o: In function `exitt0_':
/vol/Nekbone/src/comm_mpi.f:461: undefined reference to `mpi_finalize_'
obj/comm_mpi.o: In function `exitt_':
/vol/Nekbone/src/comm_mpi.f:508: undefined reference to `mpi_finalize_'
obj/comm_mpi.o: In function `igl_running_sum_':
/vol/Nekbone/src/comm_mpi.f:531: undefined reference to `mpi_scan_'
obj/comm_mpi.o: In function `pingpongo_':
/vol/Nekbone/src/comm_mpi.f:701: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:706: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:706: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:706: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:711: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:733: undefined reference to `mpi_recv_'
/vol/Nekbone/src/comm_mpi.f:733: undefined reference to `mpi_send_'
obj/comm_mpi.o: In function `gop_test_':
/vol/Nekbone/src/comm_mpi.f:838: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:838: undefined reference to `mpi_wtime_'
obj/comm_mpi.o: In function `gp2_test_':
/vol/Nekbone/src/comm_mpi.f:896: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:896: undefined reference to `mpi_wtime_'
obj/comm_mpi.o: In function `ping_loop1_':
/vol/Nekbone/src/comm_mpi.f:1049: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1049: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1049: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:1056: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1056: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1056: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1061: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:1066: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1066: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1071: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1071: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1071: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1077: undefined reference to `mpi_send_'
obj/comm_mpi.o: In function `ping_loop2_':
/vol/Nekbone/src/comm_mpi.f:1099: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1099: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1099: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:1105: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1105: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1105: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1109: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:1114: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1114: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1119: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1119: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1119: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1124: undefined reference to `mpi_send_'
obj/comm_mpi.o: In function `ping_loop_':
/vol/Nekbone/src/comm_mpi.f:1146: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1146: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:1151: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1151: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1151: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1151: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1151: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1151: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1158: undefined reference to `mpi_wtime_'
/vol/Nekbone/src/comm_mpi.f:1158: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1158: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1165: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1169: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1169: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1169: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1169: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1169: undefined reference to `mpi_send_'
/vol/Nekbone/src/comm_mpi.f:1169: undefined reference to `mpi_irecv_'
/vol/Nekbone/src/comm_mpi.f:1176: undefined reference to `mpi_wait_'
/vol/Nekbone/src/comm_mpi.f:1176: undefined reference to `mpi_send_'
pgacclnk: child process exit status 1: /usr/bin/ld
makefile:113: recipe for target 'nekbone' failed
make: *** [nekbone] Error 2

I think perhaps it is because I compiled openmpi library from sources, but did not config it properly?
Can I simply disable openmpi in the Nekbone when using CUDA?
Or what should I be careful about when building the openmpi library?

Thanks in advance.

Bambo

Thanks to helps from Mat, I got the Nekbone compiled successfully (although it seems to have runtime issue).
I added ‘-lmpi_mpiph’ as well, since I found the symbol ‘mpi_init_’ defined in libmpi_mpiph.so file.

Here is the modification I finally did:

diff --git a/src/makefile.template b/src/makefile.template
index 7dc50fc..96274d3 100755
--- a/src/makefile.template
+++ b/src/makefile.template
@@ -88,16 +88,20 @@ L2=$(G) -O2
 L3=$(G) -O3
 L4=$(L3)
 
-FL0   = $(L0) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-FL2i4 = $(L0)      $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-FL2   = $(L2) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-FL3   = $(L3) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
-FL4   = $(L4) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR)
+FL0   = $(L0) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+FL2i4 = $(L0)      $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+FL2   = $(L2) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+FL3   = $(L3) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
+FL4   = $(L4) $(P) $(PPS_F) -I$(CASEDIR) -I$S -I$(OPT_INCDIR) -I/vol/.openmpi/include
 
 cFL0   = $(L0) $(PPS_C) 
 cFL2   = $(L2) $(PPS_C) 
 cFL3   = $(L3) $(PPS_C) 
 cFL4   = $(L4) $(PPS_C) 
+cFL0   = $(L0) $(PPS_C) -I/vol/.openmpi/include
+cFL2   = $(L2) $(PPS_C) -I/vol/.openmpi/include
+cFL3   = $(L3) $(PPS_C) -I/vol/.openmpi/include
+cFL4   = $(L4) $(PPS_C) -I/vol/.openmpi/include
 ################################################################################
 all : nekbone
 
diff --git a/test/nek_gpu1/makenek.cuda b/test/nek_gpu1/makenek.cuda
index 6545794..9b608a1 100755
--- a/test/nek_gpu1/makenek.cuda
+++ b/test/nek_gpu1/makenek.cuda
@@ -7,10 +7,13 @@
 SOURCE_ROOT="../../src"
 
 # Fortran compiler
-F77="mpif90"
+#F77="mpif90"
+F77="pgfortran"
 
 # C compiler
-CC="mpicc"
+#CC="mpicc"
+#CC="pgc++"
+CC="pgcc"
 
 # pre-processor symbol list 
 # (set PPLIST=? to get a list of available symbols)
@@ -30,10 +33,12 @@ CC="mpicc"
 #USR="foo.o"
 
 # linking flags
-USR_LFLAGS="-Mcuda=cc50,cc60 -ta=nvidia:cc50,cc60"
+#USR_LFLAGS="-Mcuda=cc50,cc60 -ta=nvidia:cc50,cc60"
+USR_LFLAGS="-Mcuda=cc70 -ta=nvidia:cc70"
 
 # generic compiler flags
-G="-acc -Minfo=accel -Mcuda=cc50,cc60 -ta=nvidia:cc50,cc60"
+#G="-acc -Minfo=accel -Mcuda=cc50,cc60 -ta=nvidia:cc50,cc60"
+G="-acc -Minfo=accel -Mcuda=cc70 -ta=nvidia:cc70 -L/vol/.openmpi/lib -lmpi -lmpi_mpifh"
 
 # optimization flags
 #OPT_FLAGS_STD=""

But when I tried to execute

./nekbone gpu1

It prints out information like:

Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/SDTE7RAHOSU7CTR24N2PJMMNDX:/var/lib/docker/overlay2/l/LQ5XWNV2PL3G6DPRBKMMXS33SU:/var/lib/docker/overlay2/l/SZ37PRCKUPQNKYDHW2CMETGQNQ:/var/lib/docker/overlay2/l/ZJ5TDCH66ZIWF4EFWIQM52HYKP:/var/lib/docker/overlay2/l/D5GPWUOXPUSDXI6VXX5CJF2PAN:/var/lib/docker/overlay2/l/AJSFJ33PJNBN24V66FNHSQDJXN:/var/lib/docker/overlay2/l/PP4YSM3LWV6AEETKZWK7X7MHF6:/var/lib/docker/overlay2/l/BRVNFWTHFQQXSH5MR43DCVUYRY:/var/lib/docker/overlay2/l/G4AQ2B2DV2VJG'
Unexpected end of /proc/mounts line `NDL3ZQDQD5H36:/var/lib/docker/overlay2/l/37FTJXSHYI6WF4TGRDPDRIEVTL:/var/lib/docker/overlay2/l/T2YAOSXD4H2IIC4NFRCULST6RZ:/var/lib/docker/overlay2/l/UIWGFJL3QQHJQ74ABUEFO6XTWJ:/var/lib/docker/overlay2/l/RFHVUVKSG46SE6XLRTW3I6CJ4D:/var/lib/docker/overlay2/l/RHQINSGD3XL2WHNY4SIG4KBDBE:/var/lib/docker/overlay2/l/AN4TPT5K2EAMYPQXQ6ZXVKGNHE:/var/lib/docker/overlay2/l/2SB6JMMZ2E342RL5XQ55424O4U:/var/lib/docker/overlay2/l/KYTEGS4XGGLSIMMDGYNFLL7QVZ,upperdir=/var/lib/docker/overlay2/9b642a3a72798621bec257053b3d982'
 Number of processors:            1
 REAL    wdsize      :            8
 INTEGER wdsize      :            4
 ifmgrid    :  F     ifbrick    :  T
 
 Processor Distribution:  npx,npy,npz=            1            1            1
 Element Distribution: nelx,nely,nelz=            8            8            4
 Local Element Distribution: mx,my,mz=            8            8            4
   USE_GPU_DIRECT=0  
gs_setup: 0 unique labels shared
   handle bytes (avg, min, max): 3.47344e+06 3473436 3473436
   buffer bytes (avg, min, max): 0 0 0
   USE_GPU_DIRECT=0  
gs_setup: 0 unique labels shared
   handle bytes (avg, min, max): 308 308 308
   buffer bytes (avg, min, max): 0 0 0
 
Current file:     /vol/Nekbone/src/math.f
        function: rzero_acc
        line:     1392
This file was compiled: -ta=tesla:cc70

Is my command to execute nekbone wrong or there is still something wrong in the environment settings?

I would appreciate it for any suggestion!

Thanks,
Bambo

Hi Bambo,

I haven’t seen this message before, but I did a web search for the term “Unexpected end of /proc/mounts line” and found this link:

What CUDA driver are you using?

Though, I don’t think this message is the source of the run time error.

When I see the message “This file was compiled: -ta=tesla:cc70” it’s typically caused when a binary was built to target one GPU architecture but the user is trying to run it on a different target. “cc70” instructs the compile to target a Volta device.

What GPU are you using? (If you don’t know run the “pgaccelinfo” utility).

-Mat

Hi Mat,

It is CUDA 9.0 with V100 GPU cards.
I think the error is caused by I executed nekbone inside a cuda docker container?
Once I exit the container, and fix some path issues, it can run now:

 ./nekbone gpu1
 Number of processors:            1
 REAL    wdsize      :            8
 INTEGER wdsize      :            4
 ifmgrid    :  F     ifbrick    :  T
 
 Processor Distribution:  npx,npy,npz=            1            1            1
 Element Distribution: nelx,nely,nelz=            8            8            4
 Local Element Distribution: mx,my,mz=            8            8            4
   USE_GPU_DIRECT=0  
gs_setup: 0 unique labels shared
   handle bytes (avg, min, max): 3.47344e+06 3473436 3473436
   buffer bytes (avg, min, max): 0 0 0
   USE_GPU_DIRECT=0  
gs_setup: 0 unique labels shared
   handle bytes (avg, min, max): 308 308 308
   buffer bytes (avg, min, max): 0 0 0
 
cg:   0  6.1811E+02
cg: 101  5.2094E-06  4.9017E-01  7.0556E-01  8.0181E-11
cg:   0  6.3770E+02
cg: 101  2.1272E-06  5.1292E-01  6.6931E-01  1.3266E-11
 
nelt =     256, np =         1, nx1 =      16, elements =       256
Tot MFlops =   2.8796E+05, MFlops      =   2.8796E+05
Setup Flop =   2.2125E+10, Solver Flop =   1.5886E+09
Solve Time =   0.8235E-01
Avg MFlops =   2.8796E+05
 Exitting....

But I am not sure whether it is tested with the right workload by the command ‘./nekbone gpu1’.
I will look at nekbone’s user guide to figure it out or ask the help from the developers.
Mat, I highly appreciate your help, without which I had no way to get the nekbone run on GPU.
Thank you so much!

Bambo

I think the error is caused by I executed nekbone inside a cuda docker container?

Having not used Docker myself, I’m not sure, but it’s possible that CUDA driver wasn’t set-up properly in the container. Or there was something else that inhibited the PGI runtime from detecting the device.

Thank you so much!

You’re welcome! Glad I could help.

-Mat