Issue linking BLAS

mnicely · May 30, 2019, 7:30pm

Using PGI 19.4 on Ubuntu 18.04.

When I try to link -lblas I get the following error

error while loading shared libraries: libpgatm.so: cannot open shared object file: No such file or directory

I can fix by adding to path with

export LD_LIBRARY_PATH=/opt/pgi/linux86-64-llvm/19.4/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

But if I want to use GPU counters (PGI_ACC_TIME=1) with driver 418.67 (CUDA 10.1.168), I need to run my application with sudo.

This causes the original error.

Code is at https://github.com/mnicely/computeWorks_examples/tree/master/computeWorks_mm. You’ll need to change -lopenblas → -lblas in makefile.

Any ideas?

MatColgrove · May 30, 2019, 9:18pm

Hi Matt,

The problem being that “-Bstatic_pgi” only links the PGI runtime statically, but not libblas. However since the shared blas library, “libblas.so”, was dynamically linked against PGI’s shared libraries, it’s bringing in these references. Unfortunately, you can’t use just “-Bstatic” since there are few libraries, like libcuda.so, that don’t have static versions. The solution is to add in some link time options so that the static version of libblas is used.

It’s a bit tricky since you’re using nvcc, but I as able to get this to work by using the following options for “LIBS” in your makefile:

LIBS   := -lcublas -Xlinker "-Bstatic" -Xlinker "-lblas" -Xlinker "-Bdynamic"

% make
Building target: computeWorks_mm
nvcc -x cu -ccbin pgc++ -O2 -L/proj/pgi/linux86-64-llvm/19.4/lib/ -lcublas -Xlinker "-Bstatic" -Xlinker "-lblas" -Xlinker "-Bdynamic"  -Xcompiler "-V19.4 -Bstatic_pgi -acc -mp -ta=tesla:nordc -Mcuda -Minfo=accel" -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o "computeWorks_mm" "../src/computeWorks_mm.cu"
openACC(int, float, const float *, const float *, float, float *, const int &):
      1, include "computeWorks_mm.cu"
         163, Generating copyin(A[:n*n])
              Generating copyout(C[:n*n])
              Generating copyin(B[:n*n])
      1, include "computeWorks_mm.cu"
         167, Loop is parallelizable
      1, include "computeWorks_mm.cu"
         169, Loop is parallelizable
              Generating Tesla code
             167, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
             169, #pragma acc loop gang /* blockIdx.y */
             172, #pragma acc loop seq
      1, include "computeWorks_mm.cu"
         172, Loop is parallelizable
std::chrono::duration<double, std::ratio<(long)1, (long)1000>>::duration<long, std::ratio<(long)1, (long)1000000000>, void>(const std::chrono::duration<T1, T2> &):
      1, include "computeWorks_mm.cu"
      1, include "computeWorks_mm.cu"
Finished building target: computeWorks_mm

% ldd computeWorks_mm
        linux-vdso.so.1 =>  (0x00007ffe48dcc000)
        libcublas.so.10 => /usr/lib64/libcublas.so.10 (0x00002b15af68c000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x00002b15b3405000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00002b15b360d000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x00002b15b3829000)
        libatomic.so.1 => /home/sw/thirdparty/gcc/gcc-8.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/8.2.0/../../../../lib64/libatomic.so.1 (0x00002b15b3a2e000)
        libstdc++.so.6 => /home/sw/thirdparty/gcc/gcc-8.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/8.2.0/../../../../lib64/libstdc++.so.6 (0x00002b15b3c35000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00002b15b3fc1000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x00002b15b42c4000)
        /lib64/ld-linux-x86-64.so.2 (0x00005563a7073000)
        libgcc_s.so.1 => /home/sw/thirdparty/gcc/gcc-8.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/8.2.0/../../../../lib64/libgcc_s.so.1 (0x00002b15b4687000)
        libcublasLt.so.10 => /usr/lib64/libcublasLt.so.10 (0x00002b15b489e000)

Hope this helps,
Mat

mnicely · May 30, 2019, 9:48pm

Hey Mat,

Yes, I believe that did fix that issue!

I noticed you added the -Mcuda flag. Isn’t that just for Fortran?

Matt

MatColgrove · May 30, 2019, 10:36pm

No, not just for Fortran. It can be used in C/C++ when linking OpenACC with CUDA code so the CUDA libraries are brought into the link. We ask change some of the setup code that’s linked in so the runtime will check if the device data is coming from CUDA. Though since you’re using “nordc”, we’re not linking the device code, so adding “-Mcuda” probably doesn’t matter. I just added it as I was trying to find the right combinations of flag to get nvcc to statically link the BLAS library.

Topic		Replies	Views
PGI 16.9 libRblas.so Legacy PGI Compilers	3	3696	December 22, 2016
pgnvd changing LD_LIBRARY_PATH Legacy PGI Compilers	4	10734	October 28, 2014
/usr/bin/ld: cannot find -lcuda Legacy PGI Compilers	4	17028	May 7, 2013
Mcudalib=cublas static linking issue Legacy PGI Compilers	3	3429	December 6, 2019
Problem with Unified Binary on host without GPU Legacy PGI Compilers	2	3183	August 26, 2015
Problem with shared libraries after upgrade Legacy PGI Compilers	1	6502	February 29, 2016
libcublas.so.7.0: cannot open shared object file CUDA Setup and Installation	11	40516	June 17, 2015
mpirun mm5.mpp has error: libpgf90.so: cannot open ..... Legacy PGI Compilers	8	20581	December 13, 2007
cublas part 2 Legacy PGI Compilers	2	755	September 3, 2019
Execution Error Legacy PGI Compilers	2	5492	August 24, 2005

Issue linking BLAS

Related topics