Dramatic size increase of "lightweight" cuBLASLt library from CUDA 10 to 11

joachim.folz1 · December 8, 2020, 9:18am

I saw before that CUDA 10 included a much more lightweight BLAS library cuBLASLt, which suited me quite nicely, so I rewrote my code to use it instead of normal cuBLAS.
These are the sizes BLAS of libraries shipped with CUDA 10.2 & 11.1 for Linux:

$ ls -lh /usr/local/cuda-*/**/libcublas*
...
-rwxr-xr-x 1 root root  29M  6. Nov 17:48 cuda-10.2/lib64/libcublasLt.so.10.2.3.254
-rwxr-xr-x 1 root root  65M  6. Nov 17:48 cuda-10.2/lib64/libcublas.so.10.2.3.254
-rwxr-xr-x 1 root root 215M 14. Okt 21:34 cuda-11.1/lib64/libcublasLt.so.11.3.0.106
-rwxr-xr-x 1 root root 131M 14. Okt 21:34 cuda-11.1/lib64/libcublas.so.11.3.0.106
...

“Lightweight” cuBLASLt was quite a bit more compact before, but grew to almost 8x the size and is now much bigger than cuBLAS. I am very confused to say the least. Right now all I use it for is GEMM, so I find it hard to justify shipping 215 MB for one function. Is there a chance that this will be fixed in the future, and if not are there any (open source) recommendations for truly lightweight options?

joachim.folz1 · December 10, 2020, 1:35pm

Replaced cuBLASLt with a simple CUTLASS device function. Was surprisingly even simpler than either cuBLAS or cuBLASLt API. The increase in binary size is negligible and performance wasn’t critical anyway.

Topic		Replies	Views
cuBLASDx large matrix multiplication performance GPU-Accelerated Libraries cublas	3	82	February 9, 2026
CUBLAS: Very low occupancy CUDA Programming and Performance	3	1133	December 11, 2015
cusparseLtMatmul is slower than cublasGemmEx GPU-Accelerated Libraries cublas , cusparse	0	665	April 21, 2023
Device cublas on cuda 10.1 CUDA Programming and Performance	1	1814	June 14, 2019
CUBLAS sgemv slower than CBLAS for small matrix sizes CUDA Programming and Performance	2	1597	February 1, 2010
cuBLAS call from kernel in CUDA 10.0 GPU-Accelerated Libraries	9	5020	April 7, 2021
Complete cuBLAS anytime soon? CUDA Programming and Performance	9	12218	November 18, 2009
Why is the filesize cufft, curand npp so large? CUDA Programming and Performance	2	4754	August 12, 2011
Will cuBLAS ever be completed? CUDA Programming and Performance	15	29275	August 31, 2009
Parallel cuBLAS distributions - which one is the canonical one? CUDA Setup and Installation cublas , llama	0	35	April 26, 2026

Dramatic size increase of "lightweight" cuBLASLt library from CUDA 10 to 11

Related topics