Good question. We are working on support for JIT LTO, but in 11.2 it is not supported. So in the example you give at JIT time it will JIT each individual PTX to cubin and then do a cubin link. This is the same as we have always done for JIT linking. But we should have more support for JIT LTO in future releases.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
CUDA 12.0 Compiler Support for Runtime LTO Using nvJitLink Library | 6 | 646 | August 22, 2024 | |
Does the JIT compiler perform device link-time optimization? | 3 | 1105 | November 23, 2022 | |
Using device link-time optimization results in much larger fatbinaries | 4 | 577 | September 21, 2021 | |
Reducing Application Build Times Using CUDA C++ Compilation Aids | 1 | 657 | October 31, 2021 | |
Boosting Productivity and Performance with the NVIDIA CUDA 11.2 C++ Compiler | 0 | 527 | February 13, 2021 | |
Separate Compilation and Linking of CUDA C++ Device Code | 39 | 1862 | September 8, 2019 | |
Using dlink-time-opt together with gencode in CMAKE | 5 | 2489 | July 30, 2025 | |
Is it possible for code compiled with the -G flag to be optimized out during device link time optimization? | 2 | 249 | October 12, 2022 | |
Link-time optimization with CUDA on Linux (-flto) | 7 | 5104 | May 31, 2019 | |
Compiling programs that use dynamic parallelism (in Thrust) with device link time optimization | 1 | 186 | May 31, 2024 |