NVFORTRAN-S-0034-Syntax error at or near constant

AshutoshLondhe · April 30, 2024, 10:43pm

Hi

I am working on OPS-DSL which helps to auto-generate a CUDA code

For the handling a constant variables inside a CUDA kernel, we were writing a CONSTANTS MODULE file which looks something like this

MODULE OPS_CONSTANTS

#ifdef OPS_WITH_CUDAFOR
    use cudafor
    integer, constant :: imax_opsconstant
    integer :: imax
#else
    integer :: imax
#endif

END MODULE OPS_CONSTANTS

github.com

OP-DSL/OPS/blob/develop/apps/fortran/laplace2dtutorial/step7/constants.F90

MODULE OPS_CONSTANTS

#ifdef OPS_WITH_CUDAFOR
    use cudafor
    integer, constant :: imax_opsconstant
    integer, constant :: jmax_opsconstant
    real(8), constant :: pi_opsconstant
    integer :: imax, jmax
    real(8) :: pi
#else
    integer :: imax, jmax
    real(8) :: pi
#endif

END MODULE OPS_CONSTANTS

and then use imax_opsconstant inside a CUDA kernel

I have tested this application with NVHPC/23.1 and ran on Volta architecture. It compiles and runs fine.

I was trying this same on Hopper GPU with NVHPC/23.7 but i am getting

NVFORTRAN-S-0034-Syntax error at or near constant

Is there any change in handling constant with latest release of NVHPC compiler?

MatColgrove · May 1, 2024, 3:36pm

Did you add the “-cuda” flag to enable CUDA Fortran?

% nvfortran -c test.F90 -DOPS_WITH_CUDAFOR
NVFORTRAN-S-0034-Syntax error at or near constant (test.F90: 5)
  0 inform,   0 warnings,   1 severes, 0 fatal for ops_constants
% nvfortran -c test.F90 -DOPS_WITH_CUDAFOR -cuda
%

AshutoshLondhe · May 1, 2024, 8:33pm

Hi Mat,

After using -cuda option it result in linker error for cuRAND functions

My command looks like following

/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/comm_libs/mpi/bin/mpif90 -O3 -fast -gopt  -module /ext-home/asl/OPS_cg/offload/OPS/ops/fortran/mod/pgi/cuda -DOPS_WITH_CUDAFOR -cuda constants.F90 cuda/set_zero_kernel_cuda_kernel.CUF cuda/left_bndcon_kernel_cuda_kernel.CUF cuda/apply_stencil_kernel_cuda_kernel.CUF cuda/right_bndcon_kernel_cuda_kernel.CUF cuda/copy_kernel_cuda_kernel.CUF laplace2d_ops.F90  -L/ext-home/asl/OPS_cg/offload/OPS/ops/fortran/lib/pgi -pgc++libs -lstdc++ -lops_for_cuda -lops_hdf5 -L/ext-home/asl/install/build_hdf5/gnu/lib -lhdf5_hl -lhdf5 -lz -o laplace2d_cuda             constants.F90:
laplace2d_ops.F90:
constants.F90:
cuda/set_zero_kernel_cuda_kernel.CUF:
cuda/left_bndcon_kernel_cuda_kernel.CUF:
cuda/apply_stencil_kernel_cuda_kernel.CUF:
cuda/right_bndcon_kernel_cuda_kernel.CUF:
cuda/copy_kernel_cuda_kernel.CUF:
laplace2d_ops.F90:
/usr/bin/ld: warning: /tmp/pgcudafatxbi2bfslNS2Xw.o: missing .note.GNU-stack section implies executable stack
/usr/bin/ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/fortran/lib/pgi/libops_for_cuda.a(ops_cuda_common_cuda.o): in function `ops_randomgen_init(unsigned int, int)':
/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:165: undefined reference to `curandCreateGenerator'
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:174: undefined reference to `curandSetPseudoRandomGeneratorSeed'
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:172: undefined reference to `curandSetPseudoRandomGeneratorSeed'
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/fortran/lib/pgi/libops_for_cuda.a(ops_cuda_common_cuda.o): in function `ops_randomgen_exit()':
/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:256: undefined reference to `curandDestroyGenerator'
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/fortran/lib/pgi/libops_for_cuda.a(ops_cuda_common_cuda.o): in function `ops_fill_random_uniform(ops_dat_core*)':
/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:190: undefined reference to `curandGenerateUniformDouble'
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:198: undefined reference to `curandGenerate'
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:194: undefined reference to `curandGenerateUniform'
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/fortran/lib/pgi/libops_for_cuda.a(ops_cuda_common_cuda.o): in function `ops_fill_random_normal(ops_dat_core*)':
/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:228: undefined reference to `curandGenerateNormalDouble'
/usr/bin/ld: /ext-home/asl/OPS_cg/offload/OPS/ops/c/src/cuda/ops_cuda_common.cu:232: undefined reference to `curandGenerateNormal'
pgacclnk: child process exit status 1: /usr/bin/ld

I have LD_LIBRARY_PATH set to point to libcurand.
I am getting same error when using -cuda -gpu=cc70 on Volta. Although it compiles and runs succesfully when using -Mcuda=cc70 instead.

Also two more things,

I need to specify all files present in cuda directory to compilation. I am not able to use cuda/*_cuda_kernel.CUF in compilation line instead.
when using -mp to compile CPP code, it results in

pgc++ -O3 -fast -gopt -std=c++11 -mp -I/ext-home/asl/OPS_cg/offload/OPS/ops/c/include -I/ext-home/asl/install/build_hdf5/gnu/include -I/opt/nvidia/hpc_sdk/Linux_x86_64/24.3/cuda/include -c /ext-home/asl/OPS_cg/offload/OPS/ops/c/src/sequential/ops_host_singlenode.cpp -o /ext-home/asl/OPS_cg/offload/OPS/ops/c/obj/pgi/ops_host_singlenode.o

NVC++-S-0000-Internal compiler error. flowgraph: node is zero      14  (/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/sequential/ops_host_singlenode.cpp: 200)
NVC++-S-0000-Internal compiler error. flowgraph: node is zero      18  (/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/sequential/ops_host_singlenode.cpp: 200)
NVC++-S-0000-Internal compiler error. flowgraph: node is zero      22  (/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/sequential/ops_host_singlenode.cpp: 200)
NVC++-S-0000-Internal compiler error. flowgraph: node is zero      26  (/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/sequential/ops_host_singlenode.cpp: 200)
NVC++-F-0000-Internal compiler error. Invalid key for hash       0  (/ext-home/asl/OPS_cg/offload/OPS/ops/c/src/sequential/ops_host_singlenode.cpp: 200)
NVC++/x86-64 Linux 24.3-0: compilation aborted

if i drop -mp it compiles fine.

MatColgrove · May 1, 2024, 10:04pm

For the link error, it looks like you’re missing the “-cudalib=curand” flag which will link in the cuRAND library.

For #1, the “*” would get expanded by your shell with the result then then passed to the compiler. It’s not something the compiler would expand. Not sure why it’s not expanding correctly.

For #2, this looks like a compiler error. I was able to reproduce it here and filed a problem report, TPR #35613.

My best guess is the compiler is having issues translating the ternary ifs being used in the for loops when using the collapse clause. The work around would be to remove the “OMP_COLLAPSE” or update “ops_macros.h” so OMP_COLLAPSE defines to nothing.

AshutoshLondhe · May 3, 2024, 10:38pm

Hi Mat,

I tried compiling with -cudalib=curand and it compiles now.

Although i tried running this laplace example on Hopper, it doesnt validate, but it runs and passes on Volta and Pascal architecture. This is for Fortran Version of Application.

CPP version of Laplace passes on Hopper as well along with Volta and Pascal.

I tried comparing the result between the OpenMP offload and CUDA version for Fortran application. The differences are coming after Left Boundary kernel.

github.com

OP-DSL/OPS/blob/26ecceae88584ec67eb714bffd949614ce159810/apps/fortran/laplace2dtutorial/step7/laplace2d.F90#L128


      
          call ops_timers ( startTime )
          
          call ops_partition("")
          
          call ops_par_loop(set_zero_kernel, "set zero", grid2D, 2, bottom_range, &
                          & ops_arg_dat(d_A, 1, S2D_0pt, "real(kind=8)", OPS_WRITE))
          
          call ops_par_loop(set_zero_kernel, "set zero", grid2D, 2, top_range, &
                          & ops_arg_dat(d_A, 1, S2D_0pt, "real(kind=8)", OPS_WRITE))
          
          call ops_par_loop(left_bndcon_kernel, "left_bndcon", grid2D, 2, left_range, &
                          & ops_arg_dat(d_A, 1, S2D_0pt, "real(kind=8)", OPS_WRITE), &
                          & ops_arg_idx())
          
          call ops_par_loop(right_bndcon_kernel, "right_bndcon", grid2D, 2, right_range, &
                     & ops_arg_dat(d_A, 1, S2D_0pt, "real(kind=8)", OPS_WRITE), &
                     & ops_arg_idx())
          
          if (ops_is_root() == 1) then
              write(*,'(a,i5,a,i5,a)') 'Jacobi relaxation Calculation:', imax+2, ' x', jmax+2, ' mesh'
          end if

What could be reason here?