I am new to CUDA programming. I am writing a program that uses the dense getrf and getrs routines from the cuSOLVER library to solve a system of linear equations of the form Ax=B.
I am linking the required CUDA math libs statically, because the users of the final solution may not have these installed.
The resulting binary is ~150MB in size. I am trying to find a way to reduce this as much as possible.
I initially thought that the code coming from the static libs must be contributing to this size. So I checked the sizes of the sections using “objdump -h”. The size of .text was only ~10MB. However, the size of the .nv_fatbin section was ~117MB.
I was already using -arch=sm_86 without the -code option which, according to the nvcc guide, is equivalent to “-arch=compute_86 -code=compute_86,sm_86”. I tried -O5 and -lto. No improvement was seen.
I used the -keep and -keep-dir options to check the size of the nvcc temporary files. The fatbin file generated for my code was only 1032 bytes. The sizes of all the temporary files were as follows (without -O5 and -lto) :
$ ls -l | awk ‘{print $5 “\t” $9}’
1625492 lin_eq_cus2.cpp1.ii
1469701 lin_eq_cus2.cpp4.ii
21 lin_eq_cus2.cudafe1.c
1376525 lin_eq_cus2.cudafe1.cpp
45233 lin_eq_cus2.cudafe1.gpu
3109 lin_eq_cus2.cudafe1.stub.c
1032 lineqcus2_dp_dlink.fatbin
3408 lineqcus2_dp_dlink.fatbin.c
2992 lineqcus2_dp_dlink.o
32 lineqcus2_dp_dlink.reg.c
952 lineqcus2_dp_dlink.sm_86.cubin
31064 lin_eq_cus2.fatbin
84088 lin_eq_cus2.fatbin.c
17440 lin_eq_cus2.ltoir
28 lin_eq_cus2.module_id
49184 lin_eq_cus2.o
28779 lin_eq_cus2.ptx
22224 lin_eq_cus2.sm_86.cubin
$
I used the -Xcompiler -save-temps=cwd option to check the temporary files of gcc and g++. There were two files that were ~2MB in size but the rest were smaller.
I used the -c option and found the .o file to be ~27KB only which meant the extra size came from external sources. I checked the libcusolver, libcublas and libcublasLT static libs with objdump and found they all contain .nv_fatbin sections.
Is the .nv_fatbin section in my binary so large because it is containing some of the .nv_fatbin sections from the math libs? Is there a way to reduce the size of my binary?
My compilation command line looks something like this :
nvcc -m64 -arch sm_86 -ccbin <gnu_path> -DDP -I <cuda_includes> -I <math_libs_include> lin_eq_cus2.cu -o lineqcus2_dp -L <cuda_libs_path> -L <math_libs_path> -Xlinker -Bstatic -lcusolver_static -lcublas_static -lculibos -lcudart_static -lcublasLt_static -Xlinker -Bdynamic -ldl -lpthread -lrt
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda compilation tools, release 11.4, V11.4.100
Build cuda_11.4.r11.4/compiler.30188945_0
$
The version of gcc and g++ is 9.3.1-2.
Thanks
Karthik