Hi,
I am converting a cuda application to openmp offload. The code works and compiles with teams distribute parallel
but shows an error with teams loop
original working code:
#pragma omp target teams distribute parallel for collapse(2)
for ( int n_y=start1; n_y<end1; n_y++ ){
for ( int n_x=start0; n_x<end0; n_x++ ){
New code:
#pragma omp target teams loop
for ( int n_y=start1; n_y<end1; n_y++ ){
#pragma omp target loop
for ( int n_x=start0; n_x<end0; n_x++ ){
The error is:
internal error: assertion failed: add_primary_prefix: bad severity (edg_error.cpp, line 2985 in add_primary_prefix)
int base0 = args[0].dat->base_offset;
^
but the base0 is before the parallel region, I am not sure why this does not work after changing teams distribute parallel to teams loop.
Edit: after I changed it to:
#pragma omp target teams loop
for ( int n_y=start1; n_y<end1; n_y++ ){
#pragma omp loop
for ( int n_x=start0; n_x<end0; n_x++ ){
I get the error:
/gpfs/warwick/scrtp/avon/eb/software/NVHPC/22.11-CUDA-11.7.0/Linux_x86_64/22.11/compilers/share/llvm/bin/opt: /tmp/nvc++b7PcbdRO1JQnv.ll:3759:23: error: use of undefined value '%.field10225in9825$expdev.addr'
store i8* null, i8** %.field10225in9825$expdev.addr, align 8, !tbaa !2041, !dbg !2462
Hi adityasadawarte01,
Looks like you’re encountering an internal compiler error. You can try using a newer release to see if we’ve fixed it already, or if you can provide a reproducing example, I can investigate and report the issue if it still exists.
Thanks,
Mat
Hi Mat,
I don’t have a minimal example to reproduce this. I was working on GitHub - OP-DSL/OPS: OPS is an API with associated libraries and preprocessors to generate parallel executables for applications on mulit-block structured meshes., if this line https://github.com/OP-DSL/OPS/blob/b788b8e16e306920e59c7c0709e4e4632b3dc3f5/ops_translator/c/ops_gen_mpi_lazy.py#L373C1-L373C95 is replaced with omp target teams loop instead, any big application throws the error of undefined value. It does work if the number of pragmas are on the lower side, for example with the poisson application, there are 6 pragmas and it compiles successfully, same does not happen with Cloverleaf which has around 80 pragmas.
When I enabled Minfo, it compiles a number of pragmas successfully before this issue is encountered. No smaller applications face this issue.
Hi adityasadawarte01,
I cloned the repo but it’s not clear to me what steps I need to do to reproduce the error. Can you outline the steps you do starting with a fresh clone?
Thanks,
Mat
Hi Mat,
So you’ll need to set up the environment first. Here’s the script I used for this purpose, please update the modules and env variables for your purpose.
#!/bin/bash
export OPS_COMPILER=pgi
export OPS_INSTALL_PATH=$HOME/OPS/ops
module purge
#MPI and Compilers
module load GCC/11.3.0 GDB/12.1 OpenMPI/4.1.4
module load NVHPC/22.11-CUDA-11.7.0
unset MPI_INSTALL_PATH
export MPI_INSTALL_PATH=/scrtp/avon/eb/software/OpenMPI/4.1.4-GCC-11.3.0
export PATH=$MPI_INSTALL_PATH/bin:$PATH
export LD_LIBRARY_PATH=$MPI_INSTALL_PATH/lib:$LD_LIBRARY_PATH
export OP_AUTO_SOA=1
export MPICC=$MPI_INSTALL_PATH/bin/mpicc
export MPICPP=$MPI_INSTALL_PATH/bin/mpic++
export MPICXX=$MPI_INSTALL_PATH/bin/mpicxx
export MPIFC=$MPI_INSTALL_PATH/bin/mpifort
export MPIF90=$MPI_INSTALL_PATH/bin/mpifort
export MPI_INC=$MPI_INSTALL_PATH/include
export MPI_LIB=$MPI_INSTALL_PATH/lib
export OMPI_CC=nvc
export OMPI_CXX=nvc++
unset HDF5_INSTALL_PATH
module load CMake/3.18.4
unset CUDA_INSTALL_PATH
export CUDA_INSTALL_PATH=/scrtp/avon/eb/software/NVHPC/22.11-CUDA-11.7.0/Linux_x86_64/22.11/compilers
export OPENCL_INSTALL_PATH=/scrtp/avon/eb/software/NVHPC/22.11-CUDA-11.7.0/Linux_x86_64/22.11/compilers
unset NV_ARCH
export NV_ARCH=Turing
For installing OPS, these instructions would suffice Getting Started — Oxford Parallel library for Structured mesh solvers latest documentation
As for running the mini apps:
# compile backend
cd OPS/ops/c
make clean
make ompoffload
# compile application
cd OPS/apps/c/Cloverleaf
make clean
make cloverleaf_ompoffload
Please do not forget to replace the https://github.com/OP-DSL/OPS/blob/b788b8e16e306920e59c7c0709e4e4632b3dc3f5/ops_translator/c/ops_gen_mpi_lazy.py#L373C1-L373C95 with omp teams loop.
If you compile OPS/apps/c/poisson with:
cd OPS/apps/c/poisson
make clean
make poisson_ompoffload
It works perfectly well, because it only has 6 pragma statements.
Please let me know if you have any issues with setting up OPS.
Thanks adityasadawarte01,
For some reason there weren’t “ompoffload” rules generated in the makefiles and I didn’t see anything in the OPS docs on how to create them. However, I was able to find where the “OpenMP_offload” source was and able to compile the code by hand to reproduce the error with our development compiler:
% nvc++ -mp=gpu -I./include -w -c PdV_ompoffload_kernels.cpp
/proj/nv/Linux_x86_64/236767-dev/compilers/share/llvm/bin/llc: error: /proj/nv/Linux_x86_64/236767-dev/compilers/share/llvm/bin/llc: /tmp/nvc++we9XdcARnIG6j.ll:1954:22: error: use of undefined value '%.field8996in8741$expdev.addr'
store ptr null, ptr %.field8996in8741$expdev.addr, align 8, !dbg !754 ; PdV_ompoffload_kernels.cpp:116
^
I filed a problem report, TPR#34285, and sent it our engineers for investigation.
-Mat
1 Like
Thanks for the reply. No worries, I think the issue is probably with your setup, but that’s fine since you are able to reproduce the error.