Why cublas shared library is static linked with cuda and cudart?

I have noticed that when doing ldd cublas.so I do not see any linking with cuda.so. Additionally, nm -gD cublas.so does not show any symbols of cuda. However, doing nm cublas_static.a show that calls to cuda is undefined. This means that cublas.so is statically linked with cuda, whereas cublas_static.a is dynamically linked. My question is, why is this happening? Is there any performance issue I must consider when using static cublas? Is there any dynamic cublas library that is dynamically linked with cuda?

Having libcublas.so statically linked to the cudart provider means there is no possibility for a mismatch at runtime. Furthermore, it is possible to do sensible operations on the GPU using only the cublas library; no specific explicit cuda runtime API calls are necessarily needed. So having libcublas.so provide a cudart provider seems to be sensible to me.

Your conclusion about libcublas_static.a is not correct. The linking has not been fully resolved; the cudart provider could be either statically or dynamically linked, as determined by the link specification provided later.

Here is a demonstrator of these concepts:

$ cat t1.cpp
#include <cublas_v2.h>

int main(){
  float *h = NULL;
  float *d = NULL;
  cublasSetVector(1, 4, h, 1, d, 1);
}
$ g++ t1.cpp -I/usr/local/cuda/include  -L/usr/local/cuda/lib64 -lcublas
$ ldd a.out
   linux-vdso.so.1 =>  (0x00007fffb076e000)
   libcublas.so.11 => /usr/local/cuda/lib64/libcublas.so.11 (0x00007fdf13a29000)
   libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fdf136f4000)
   libm.so.6 => /lib64/libm.so.6 (0x00007fdf133f2000)
   libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fdf131dc000)
   libc.so.6 => /lib64/libc.so.6 (0x00007fdf12e19000)
   libcublasLt.so.11 => /usr/local/cuda/lib64/libcublasLt.so.11 (0x00007fdf00d0a000)
   libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdf00aee000)
   librt.so.1 => /lib64/librt.so.1 (0x00007fdf008e5000)
   libdl.so.2 => /lib64/libdl.so.2 (0x00007fdf006e1000)
   /lib64/ld-linux-x86-64.so.2 (0x00007fdf1c963000)
$ g++ t1.cpp -I/usr/local/cuda/include  -L/usr/local/cuda/lib64 -lcublas_static -lcublasLt_static  -lcudart_static -lculibos -ldl -lpthread -lrt
$ ldd a.out
   linux-vdso.so.1 =>  (0x00007ffd489f5000)
   libdl.so.2 => /lib64/libdl.so.2 (0x00007f2f41686000)
   libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2f41469000)
   librt.so.1 => /lib64/librt.so.1 (0x00007f2f41261000)
   libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f2f40f5a000)
   libm.so.6 => /lib64/libm.so.6 (0x00007f2f40c57000)
   libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f2f40a41000)
   libc.so.6 => /lib64/libc.so.6 (0x00007f2f4067f000)
   /lib64/ld-linux-x86-64.so.2 (0x00007f2f418b8000)
$

Thank you very much for your response. Regarding, libcublas_static.a I disagree. If you “ldd” the executable of a program the linking depends on the libraries you have used, as you mentioned. But if you try to see the symbols of libcublas_static.a using nm, the CUDA calls are undefined, meaning that they will be resolved at runtime.
$ nm libcublas_static.a

0000000000000000 T clsLazyInitThreadContext
00000000000000b0 T clsShutdownThreadContext
U cudaGetExportTable
U GLOBAL_OFFSET_TABLE
0000000000000000 r _ZL42CU_ETID_ContextLocalStorageInterface_v0301

cublasMg.o:
U calloc
U cublasCreate_v2
U cublasDestroy_v2
0000000000001000 T cublasXtCreate
00000000000044d0 T cublasXtDestroy
000000000000be60 T cublasXtDeviceSelect
000000000000b3b0 T cublasXtGetBlockDim
000000000000b910 T cublasXtGetNumBoards
00000000000047a0 T cublasXtGetPinningMemMode
000000000000b660 T cublasXtMaxBoards
000000000000b100 T cublasXtSetBlockDim
000000000000e4e0 T cublasXtSetCpuRatio
000000000000c2f0 T cublasXtSetCpuRoutine
0000000000007c30 T cublasXtSetPinningMemMode
U cudaDeviceGetAttribute
U cudaGetDevice
U cudaGetLastError
U cudaHostGetFlags
U cudaHostRegister
U cudaPointerGetAttributes
U cudaSetDevice
U culibosInit
U __cxa_guard_acquire
U __cxa_guard_release
U dlclose
U dlopen

$ nm -gD libcublas.so
nm -gD libcudart_so.txt (38.2 KB)
nm libcublas_static.a.txt (2.4 MB)
U abort@GLIBC_2.2.5
U bind@GLIBC_2.2.5
U bindtextdomain@GLIBC_2.2.5
U bind_textdomain_codeset@GLIBC_2.2.5
U btowc@GLIBC_2.2.5
U calloc@GLIBC_2.2.5
U ceil@GLIBC_2.2.5
U chmod@GLIBC_2.2.5
U clock_gettime@GLIBC_2.2.5
U close@GLIBC_2.2.5
U closedir@GLIBC_2.2.5
U connect@GLIBC_2.2.5
U __ctype_get_mb_cur_max@GLIBC_2.2.5
0000000000198170 T cublasAlloc@@libcublas.so.11
0000000000247e50 T cublasAsumEx@@libcublas.so.11
000000000021e560 T cublasAxpyEx@@libcublas.so.11
0000000000199940 T cublasCaxpy@@libcublas.so.11
000000000021d540 T cublasCaxpy_v2@@libcublas.so.11

I don’t agree with you, so I will just say I agree to disagree.

Library entry-point resolution at runtime can happen via 2 paths that I know of:

  • ordinary dynamic linking to a shared object using the linux dynamic linker.
  • linking under program control via program manual load of a library and manual fixup of entry point table (dynamic loading).

We have established that dynamic linking is not occurring - ldd would indicate if it were.

So the only other possibility is that the cublas static library looks for and dynamically loads a cudart shared object, and then does manual fixup of the entry point table.

If that is what you are claiming, then I haven’t disproved that yet. I believe it is unlikely. I guess the experiment to run would be as follows:

  • build an application like I have shown, fully statically linked.
  • on the machine in question, obfuscate the cudart SO library so that any dynamic load can’t find it
  • see if the application fails, or not.

So I ran that experiment on the application the way I built it above. I moved libcudart.so.11.4.108 from the normal place (/usr/local/cuda/lib64) to a hidden directory that won’t be found. At that point, libcudart.so and libcudart.so.11.0 both indicate as broken symlinks. I then ran the application. No errors occurred. In fact, if I run the exact statically linked app that I demonstrated above, under compute-sanitizer, I get an appropriate runtime API error message (“Program hit invalid argument (error 1) on CUDA API call to cudaMemcpy.”) indicating that the runtime API is fully functional.

I don’t plan to argue this any further. I disagree. Good luck.

So why there are no symbols in libcudart_so.txt while there are undefined symbols in libcublas_static.a.txt ?

Sorry, I have no idea. I’m confident my test cases support my claims, however.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.