Linker Errors when compiling c++ class members with OpenACC

Hello, so I’m currently trying to port my c++ code to GPUs using OpenACC as opposed to CUDA to save some time on the port. I’m following this document to port my classes into OpenACC https://www.pgroup.com/lit/brochures/openacc_sc14.pdf

However when the code is linked I get the undefined symbols in both the class and where the class is called. I’ll provide a dump of all output I have. I’m not that familiar with Open ACC, so there may be a lot of irrelevant information. Any help with this would be much appreciated. Many thanks in advance!

The linker error in the class is:

Solution.o: In function Solution::devcopyin()': /usr/include/c++/5/cmath:292: undefined reference to __pgi_uacc_dataenterstart’
/usr/include/c++/5/cmath:292: undefined reference to __pgi_uacc_dataonb' /usr/include/c++/5/cmath:292: undefined reference to __pgi_uacc_dataenterdone’

and from where the class is called:


Solver.o: In function Solver::General_Purpose_Solver_mk_i(unstructured_mesh&, Solution&, Boundary_Conditions&, external_forces&, global_variables&, domain_geometry&, initial_conditions&, unstructured_bcs&, int, Solution&, int, post_processing&)': /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataenterstart’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataonb' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataonb’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataonb' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataonb’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataonb' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataenterdone’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_enter' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataenterstart’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataonb' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataonb’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataenterdone' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_computestart’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_launch' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_computedone’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataexitstart' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataoffb2’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataoffb2' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataexitdone’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_noversion' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataexitstart’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataoffb2' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataoffb2’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataoffb2' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataoffb2’
/usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataoffb2' /usr/include/c++/5/bits/stl_uninitialized.h:301: undefined reference to __pgi_uacc_dataexitdone’

The class constructor and devcopyin function are given as:


Solution::Solution(int _total_nodes)
{
    //ctor
    total_nodes = _total_nodes;
     
     rho = new double [total_nodes ];
        if (rho==NULL) exit (1);
     u = new double [total_nodes];
        if (u==NULL) exit (1);
     v = new double [total_nodes ];
        if (v==NULL) exit (1);
     w = new double [total_nodes ];
        if (w==NULL) exit (1);
    error = new double [total_nodes ];
        if (error==NULL) exit (1);
      Initialise();

}

void Solution::devcopyin(){

  #pragma acc enter data copyin (this)
  #pragma acc enter data copyin(rho[0:total_nodes])
  #pragma acc enter data copyin(u[0:total_nodes])
  #pragma acc enter data copyin(v[0:total_nodes])
  #pragma acc enter data copyin(w[0:total_nodes])
  #pragma acc enter data copyin(error[0:total_nodes])
}

The parallel kernel stripped down to scalers and the above class is:

soln.devcopyin();
          soln_t1.devcopyin();
          soln_t0.devcopyin();
          residual_worker.devcopyin();
          temp_soln.devcopyin();


#pragma acc kernels present(soln , soln_t1, soln_t0 , residual_worker, temp_soln)
{
          double f1,f2,f3,f4;
          f1 = 0.0;
          f2 = 0.0;
          f3 = 0.0;
          f4 = 0.0;
          double d_t_temp = 1.0;  

            //update RK values
            for( int i=0; i < Mesh.get_n_cells(); i++){

                    // update intermediate macroscopic variables for next Runge Kutta Time Step
                    f1 = soln_t0.get_rho(i) + residual_worker.get_rho(i)* d_t_temp ;
                    f2 = soln_t0.get_u(i) + (residual_worker.get_u(i)) * d_t_temp;
                    f3 = soln_t0.get_v(i) + residual_worker.get_v(i) * d_t_temp;
                     f4 = soln_t0.get_w(i) + residual_worker.get_w(i) * d_t_temp;

                      // change momentum to velocity
                    f2 = f2/f1;
                    f3 =f3/f1;
                    f4=f4/f1;

                     temp_soln.update(f1,f2,f3,f4, i);
                    //temp_soln.update(1.0,f2,0.0,0.0, i);

                    //add contributions to
                    soln_t1.add_rho(i, d_t_temp * residual_worker.get_rho(i));
                    soln_t1.add_u(i, d_t_temp * (residual_worker.get_u(i)));
                    soln_t1.add_v(i, d_t_temp* residual_worker.get_v(i));
                    soln_t1.add_w(i, d_t_temp * residual_worker.get_w(i));

                    f1 = soln_t1.get_rho(i);
                    f2 = soln_t1.get_u(i)/soln_t1.get_rho(i);
                    f3 = soln_t1.get_v(i)/soln_t1.get_rho(i);
                    f4= soln_t1.get_w(i)/soln_t1.get_rho(i);

                   soln.update(f1,f2,f3,f4, i);
                   //soln.update(1.0,f2,0.0,0.0, i);

            }
}

This is the makefile:

CC	= pgc++
src	= $(wildcard *.cpp)
obj	= $(src:.cpp=.o)
CCFLAGS	= -O2 -Kieee -std=c++11 -Mprof=ccff  -Iinclude -I/home/brendan/boost_1_64_0/prefix/include -I/usr/local/tecplot/360ex_2018r1/include -I/home/brendan/Eigen -I/home/brendan/CGNS/CGNS-3.3.1/src -I/usr/include/
ACCFLAGS	= -acc -ta=tesla -Minfo=accel
LDFLAGS	= /usr/local/tecplot/360ex_2017r2/bin/libtecio.so /home/brendan/boost_1_64_0/prefix/lib/libboost_system.a /home/brendan/boost_1_64_0/prefix/lib/libboost_filesystem.a /home/brendan/CGNS/CGNS-3.3.1/src/lib/libcgns.a

myprog: $(obj)
	 $(CC) -o $@ $^ -v $(LDFLAGS)

%.o : %.cpp
	$(CC)  $(CCFLAGS) $(ACCFLAGS) -o $@ -c $<

.PHONY: clean
clean:
	rm -f $(obj) myprog

and finally this is the command line output of the linker:


pgc++ -o myprog RungeKutta.o Mesh.o residuals.o quad_bcs.o vector_var.o initial_conditions.o quad_bcs_plus.o unstructured_mesh.o gradients.o artificial_dissipation.o program.o external_forces.o domain_geometry.o preprocessor.o global_variables.o tecplot_output.o Boundary_Conditions.o tinyxml2.o post_processing.o flux_var.o main.o unstructured_bcs.o Solver.o Solution.o -v /usr/local/tecplot/360ex_2017r2/bin/libtecio.so /home/brendan/boost_1_64_0/prefix/lib/libboost_system.a /home/brendan/boost_1_64_0/prefix/lib/libboost_filesystem.a /home/brendan/CGNS/CGNS-3.3.1/src/lib/libcgns.a
Export PGI=/opt/pgi

/usr/bin/ld /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /opt/pgi/linux86-64/18.4/lib/trace_init.o /usr/lib/gcc/x86_64-linux-gnu/5/crtbegin.o /opt/pgi/linux86-64/18.4/lib/initmp.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /opt/pgi/linux86-64/18.4/lib/pgi.ld -L/opt/pgi/linux86-64/18.4/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/5 RungeKutta.o Mesh.o residuals.o quad_bcs.o vector_var.o initial_conditions.o quad_bcs_plus.o unstructured_mesh.o gradients.o artificial_dissipation.o program.o external_forces.o domain_geometry.o preprocessor.o global_variables.o tecplot_output.o Boundary_Conditions.o tinyxml2.o post_processing.o flux_var.o main.o unstructured_bcs.o Solver.o Solution.o /usr/local/tecplot/360ex_2017r2/bin/libtecio.so /home/brendan/boost_1_64_0/prefix/lib/libboost_system.a /home/brendan/boost_1_64_0/prefix/lib/libboost_filesystem.a /home/brendan/CGNS/CGNS-3.3.1/src/lib/libcgns.a -rpath /opt/pgi/linux86-64/18.4/lib -rpath /usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib64 -o myprog -L/usr/lib/gcc/x86_64-linux-gnu/5/../../../../lib64 -latomic -lpgatm -lstdc++ -lpgmp -lnuma -lpthread --start-group -lpgmath -lnspgc -lpgc --end-group -lm -lgcc -lc -lgcc -lgcc_s /usr/lib/gcc/x86_64-linux-gnu/5/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o

Hi brendan_w,

It looks like the problem is that you’re missing the OpenACC compiler flags on you link. These flags tells the compiler to include the OpenACC runtime libraries (which is where these undefined reference are coming from)

Try updating your Makefile with:

myprog: $(obj) 
    $(CC) -o $@ $^ -v $(LDFLAGS)  $(ACCFLAGS)

Hope this helps,
Mat

Hi Mat,

That did the trick! Thanks very much for the quick response.

To add some other learning experiences. I had tried overnight to manually add the accelerator libraries to the makefile and managed to get it to compile. However then it couldn’t find the CUDA driver at runtime. The makefile update you provided also fixed this issue.

I’m also keeping a track of challenges I face during my port and will share here afterwards as there’s not a lot of resources on OpenACC and object orientated c++ ports.

  • Brendan