Let’s say I compile/link an application using device link-time optimization (available from CUDA 11.2+) using the following options:
As expected, this will create a fatbinary containing PTX for
sm_52, LTO intermediaries for
sm_61, and link-time optimized SASS for
However, according to
cuobjdump -all (output: cuobjdump.txt (4.7 KB)), the fatbinary also contains ELF code (SASS? LTO SASS?) for all GPU architectures supported by the current CUDA toolkit (e.g.
sm_86 in the case of CUDA 11), as well as PTX for
sm_86. This then obviously greatly increases the size of the resulting fatbinary.
It is unclear to me how/why these extra fatbin sections are generated? What purpose do they serve? Is there a compiler/linker flag to disable their generation?