Hi Matt,
The problem being that “-Bstatic_pgi” only links the PGI runtime statically, but not libblas. However since the shared blas library, “libblas.so”, was dynamically linked against PGI’s shared libraries, it’s bringing in these references. Unfortunately, you can’t use just “-Bstatic” since there are few libraries, like libcuda.so, that don’t have static versions. The solution is to add in some link time options so that the static version of libblas is used.
It’s a bit tricky since you’re using nvcc, but I as able to get this to work by using the following options for “LIBS” in your makefile:
LIBS := -lcublas -Xlinker "-Bstatic" -Xlinker "-lblas" -Xlinker "-Bdynamic"
% make
Building target: computeWorks_mm
nvcc -x cu -ccbin pgc++ -O2 -L/proj/pgi/linux86-64-llvm/19.4/lib/ -lcublas -Xlinker "-Bstatic" -Xlinker "-lblas" -Xlinker "-Bdynamic" -Xcompiler "-V19.4 -Bstatic_pgi -acc -mp -ta=tesla:nordc -Mcuda -Minfo=accel" -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o "computeWorks_mm" "../src/computeWorks_mm.cu"
openACC(int, float, const float *, const float *, float, float *, const int &):
1, include "computeWorks_mm.cu"
163, Generating copyin(A[:n*n])
Generating copyout(C[:n*n])
Generating copyin(B[:n*n])
1, include "computeWorks_mm.cu"
167, Loop is parallelizable
1, include "computeWorks_mm.cu"
169, Loop is parallelizable
Generating Tesla code
167, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
169, #pragma acc loop gang /* blockIdx.y */
172, #pragma acc loop seq
1, include "computeWorks_mm.cu"
172, Loop is parallelizable
std::chrono::duration<double, std::ratio<(long)1, (long)1000>>::duration<long, std::ratio<(long)1, (long)1000000000>, void>(const std::chrono::duration<T1, T2> &):
1, include "computeWorks_mm.cu"
1, include "computeWorks_mm.cu"
Finished building target: computeWorks_mm
% ldd computeWorks_mm
linux-vdso.so.1 => (0x00007ffe48dcc000)
libcublas.so.10 => /usr/lib64/libcublas.so.10 (0x00002b15af68c000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00002b15b3405000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00002b15b360d000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x00002b15b3829000)
libatomic.so.1 => /home/sw/thirdparty/gcc/gcc-8.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/8.2.0/../../../../lib64/libatomic.so.1 (0x00002b15b3a2e000)
libstdc++.so.6 => /home/sw/thirdparty/gcc/gcc-8.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/8.2.0/../../../../lib64/libstdc++.so.6 (0x00002b15b3c35000)
libm.so.6 => /usr/lib64/libm.so.6 (0x00002b15b3fc1000)
libc.so.6 => /usr/lib64/libc.so.6 (0x00002b15b42c4000)
/lib64/ld-linux-x86-64.so.2 (0x00005563a7073000)
libgcc_s.so.1 => /home/sw/thirdparty/gcc/gcc-8.2.0/linux86-64/lib/gcc/x86_64-pc-linux-gnu/8.2.0/../../../../lib64/libgcc_s.so.1 (0x00002b15b4687000)
libcublasLt.so.10 => /usr/lib64/libcublasLt.so.10 (0x00002b15b489e000)
Hope this helps,
Mat