Mcudalib=cublas static linking issue

Hi,

I use the cublas library in my code. Therefore I use the -Mcudalib=cublas flag to include the libraries. If I link against the shared version of the library it works. However, if I link against the static version
(-Bstatic --> libcublas_static.a) I get some missing references because the libcublas also needs libcublasLt which seems not to be properly provided with the static flag. As far as I know the additional
-Mcudalib=cublasLt is not supported so shouldn’t the libcublasLt also automatically be provided by using the -Mcudalib=cublas flag?

Many thanks.
Regards,
Reto

Hi Reto,

We actually do implicitly include cublasLt_static.a on the link when statically linking. The problem here is that even the static versions of cuBLAS (as well as other CUDA libraries) include some dynamic loads which then cause undefined reference errors (to things like dlopen, dlclose, dlsym, etc.), hence you can’t create a completely static executable.

Instead of “-Bstatic”, try using the “-Bstatic_pgi” flag. In this case, we’ll link the PGI runtime as well as the CUDA libraries statically, but link the system libraries dynamically.

For example (note that I’m using the verbose “-v” flag, to show what the actual link line looks like):

% pgfortran -Mcuda -Mcudalib=cublas -v test_cublas.o -Bstatic_pgi -V19.10
Export PGI_CURR_CUDA_HOME=/proj/pgi/linux86-64-llvm/2019/cuda/10.1
Export PGI=/proj/pgi

/proj/pgi/linux86-64-llvm/19.10/bin/pgacclnk -nvidia /proj/pgi/linux86-64-llvm/19.10/bin/pgnvd -cuda10010 -cudaroot /proj/pgi/linux86-64-llvm/2019/cuda/10.1 -cudalink -computecap=70 -v /usr/bin/ld /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /proj/pgi/linux86-64-llvm/19.10/lib/trace_init.o /usr/lib/gcc/x86_64-linux-gnu/7/crtbegin.o /proj/pgi/linux86-64-llvm/19.10/lib/f90main.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /proj/pgi/linux86-64-llvm/19.10/lib/pgi.ld -L/proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -L/proj/pgi/linux86-64-llvm/19.10/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/7 test_cublas.o -rpath /proj/pgi/linux86-64-llvm/19.10/lib -rpath /proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -rpath /usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib64 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib64 -Bstatic -Bdynamic -Bstatic -lcudafor101 -lcudafor -lcudaforblas /proj/pgi/linux86-64-llvm/19.10/lib/cuda_init_register_end.o -Bdynamic -Bstatic -lcublas_static -lcublasLt_static -lculibos -lcudaforblas -Bdynamic -L/proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -Bstatic -lcudadevrt -lcudart_static -Bdynamic -ldl -Bstatic -lcudafor2 -Bdynamic -Bstatic -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgatm -lpgkomp -lomp -as-needed -lomptarget -no-as-needed -Bdynamic -Bstatic -Bdynamic -lpthread -Bstatic --start-group -lpgmath -lnspgc -lpgc --end-group -Bdynamic -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s -lstdc++ /usr/lib/gcc/x86_64-linux-gnu/7/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o
/proj/pgi/linux86-64-llvm/19.10/bin/pgnvd -dcuda /proj/pgi/linux86-64-llvm/2019/cuda/10.1 /proj/pgi/linux86-64-llvm/19.10/lib/trace_init.o /proj/pgi/linux86-64-llvm/19.10/lib/f90main.o -L/proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -L/proj/pgi/linux86-64-llvm/19.10/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/7 test_cublas.o -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib64 -lcudafor101 -lcudafor -lcudaforblas /proj/pgi/linux86-64-llvm/19.10/lib/cuda_init_register_end.o -lcublas_static -lcublasLt_static -lculibos -L/proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -lcudadevrt -lcudart_static -ldl -lcudafor2 -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgftnrtl -lpgatm -lpgkomp -lomp -lomptarget -lpthread -lpgmath -lnspgc -lpgc -lrt -lpthread -lm -lgcc -lc -lgcc_s -lstdc++ -dolink -cuda10010 -computecap 70 -o /tmp/pgcuda_Yec6HXaVL0G.cubin -regobj /tmp/pgcudaregoYecQ8Hv_1fl.o
/proj/pgi/linux86-64-llvm/19.10/bin/pgnvd -fatobj /tmp/pgcudafatUYeckX99qtzm.o -o /tmp/pgcudafatUYeckX99qtzm.o -cuda10010 -dcuda /proj/pgi/linux86-64-llvm/2019/cuda/10.1 -cudalink -sm 70 /tmp/pgcuda_Yec6HXaVL0G.cubin
/usr/bin/ld /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /tmp/pgcudafatUYeckX99qtzm.o /tmp/pgcudaregoYecQ8Hv_1fl.o /proj/pgi/linux86-64-llvm/19.10/lib/trace_init.o /usr/lib/gcc/x86_64-linux-gnu/7/crtbegin.o /proj/pgi/linux86-64-llvm/19.10/lib/f90main.o --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /proj/pgi/linux86-64-llvm/19.10/lib/pgi.ld -L/proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -L/proj/pgi/linux86-64-llvm/19.10/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/7 test_cublas.o -rpath /proj/pgi/linux86-64-llvm/19.10/lib -rpath /proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -rpath /usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib64 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib64 -Bstatic -Bdynamic -Bstatic -lcudafor101 -lcudafor -lcudaforblas /proj/pgi/linux86-64-llvm/19.10/lib/cuda_init_register_end.o -Bdynamic -Bstatic -lcublas_static -lcublasLt_static -lculibos -lcudaforblas -Bdynamic -L/proj/pgi/linux86-64-llvm/2019/cuda/10.1/lib64 -Bstatic -lcudadevrt -lcudart_static -Bdynamic -ldl -Bstatic -lcudafor2 -Bdynamic -Bstatic -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lpgatm -lpgkomp -lomp -as-needed -lomptarget -no-as-needed -Bdynamic -Bstatic -Bdynamic -lpthread -Bstatic --start-group -lpgmath -lnspgc -lpgc --end-group -Bdynamic -lrt -lpthread -lm -lgcc -lc -lgcc -lgcc_s -lstdc++ /usr/lib/gcc/x86_64-linux-gnu/7/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o

Hope this helps,
Mat

Hi Mat,

thanks for your reply. Actually, I just realized that we do use -Bstatic_pgi in case of the PGI compiler. I get still tons of undefined references. Below I attached just a small selection. Maybe it already rings a bell? However, if I add the libcublas_static and libcublasLt_static by hand it seems to work fine.

Regards,
Reto

/remote/tcadprod/depot/linux/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas_static.a(cublas.o): In function `__static_initialization_and_destruction_0(int, int)':
cublas.compute_75.cudafe1.cpp:(.text+0x1ac): undefined reference to `CublasGPVar::GPVar::GPVar(char const*, int)'
/remote/tcadprod/depot/linux/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas_static.a(cublas.o): In function `cublasSetBackdoor':
cublas.compute_75.cudafe1.cpp:(.text+0x257): undefined reference to `CublasGPVar::GPVar::SetValue(char const*, char, void*)'
/remote/tcadprod/depot/linux/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas_static.a(cublas.o): In function `cublasGetBackdoor':
cublas.compute_75.cudafe1.cpp:(.text+0x289): undefined reference to `CublasGPVar::GPVar::GetValue(char const*, char, void*)'
/remote/tcadprod/depot/linux/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas_static.a(cublas.o): In function `cublasCtxInit(cublasContext**)':
cublas.compute_75.cudafe1.cpp:(.text+0x33b): undefined reference to `cublasFixedSizePoolWithGraphSuppport::cublasFixedSizePoolWithGraphSuppport()'
cublas.compute_75.cudafe1.cpp:(.text+0x343): undefined reference to `cublasFixedSizePoolWithGraphSuppport::cublasFixedSizePoolWithGraphSuppport()'
cublas.compute_75.cudafe1.cpp:(.text+0x34b): undefined reference to `cublasLtCtxInit'
cublas.compute_75.cudafe1.cpp:(.text+0x39b): undefined reference to `cublasFixedSizePoolWithGraphSuppport::init(cublasContext*, int, int)'
cublas.compute_75.cudafe1.cpp:(.text+0x3bd): undefined reference to `cublasFixedSizePoolWithGraphSuppport::init(cublasContext*, int, int)'
cublas.compute_75.cudafe1.cpp:(.text+0x417): undefined reference to `init_gemm_select'
/remote/tcadprod/depot/linux/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas_static.a(cublas.o): In function `cublasGetProperty':
cublas.compute_75.cudafe1.cpp:(.text+0x2257): undefined reference to `cublasLtGetProperty'
/remote/tcadprod/depot/linux/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas_static.a(cublas.o): In function `cublasGetVersion_v2':
cublas.compute_75.cudafe1.cpp:(.text+0x484a): undefined reference to `cublasLtGetVersion'
/remote/tcadprod/depot/linux/pgi/linux86-64-llvm/2019/cuda/10.1/lib64/libcublas_static.a(cublas.o): In function `cublasDestroy_v2':
cublas.compute_75.cudafe1.cpp:(.text+0x4fac): undefined reference to `cublasFixedSizePoolWithGraphSuppport::tearDown()'
cublas.compute_75.cudafe1.cpp:(.text+0x4fb4): undefined reference to `cublasFixedSizePoolWithGraphSuppport::tearDown()'
cublas.compute_75.cudafe1.cpp:(.text+0x4fc9): undefined reference to `cublasLtShutdownCtx'
cublas.compute_75.cudafe1.cpp:(.text+0x4fd1): undefined reference to `cublasFixedSizePoolWithGraphSuppport::~cublasFixedSizePoolWithGraphSuppport()'
cublas.compute_75.cudafe1.cpp:(.text+0x4fd9): undefined reference to `cublasFixedSizePoolWithGraphSuppport::~cublasFixedSizePoolWithGraphSuppport()'
cublas.compute_75.cudafe1.cpp:(.text+0x5064): undefined reference to `free_gemm_select'

Hi Reto,

As you can see from the verbose output I posted above “-lcublas_static -lcublasLt_static” are being included. What’s happening in you case is unclear.

Can you do something similar where you post you’re link command with the verbose flag (-v), so we can see what libraries are being included?

Thanks,
Mat