CUDA 12.0 Compiler Support for Runtime LTO Using nvJitLink Library

jwitsoe · January 17, 2023, 10:40pm

Originally published at: https://developer.nvidia.com/blog/cuda-12-0-compiler-support-for-runtime-lto-using-nvjitlink-library/

CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support.

920416151 · March 26, 2023, 12:53pm

Hi,
some problems have annoyed me,like following statement:
"JIT LTO minimizes the impact on binary size by enabling the cuFFT library to build LTO optimized speed-of-light (SOL) kernels for any parameter combination, at runtime. This is achieved by shipping the building blocks of FFT kernels instead of specialized FFT kernels. "
can you explain what ”the building blocks of FFT kernels“ means？

Thanks

mmurphy1 · April 17, 2023, 6:22pm

Thanks for the question. I am not the FFT developer, but in general what they have done is decompose their algorithm into individual pieces. Previously the library was very large because they provided all permutations of an algorithm. Now they just have a handful of building blocks which they can combine into a specific permutation at runtime. They gave a GTC talk about the work they have done which has some more details.

lligowski · April 17, 2023, 7:40pm

Looking at cuFFTDx library (C++ header only) can give good insight on what can be considered FFT building blocks. Bit more summarized view from another point of view would be SIAM PP22 presentation (slide 10).

Links:

qiji · August 22, 2024, 2:31am

hi，

 nvJitLinkAddData(handle, NVJITLINK_INPUT_CUBIN, (void*)cubin_s.c_str(),
                         cubin_s.size(), "program_rtc.cubin"));

but I get a error: ERROR 1: bad input: does not match type NVJITLINK_INPUT_CUBINprogram_rtc.cubin
Please help me @lligowski

mferreravila · August 22, 2024, 3:53pm

Hi qiji,

This error indicates that the type of input passed to nvJitLinkAddData (NVJITLINK_INPUT_*) does not match the format found in the data array ((void*)cubin_s.c_str() in your case).

If you compiled the device code with NVRTC, it’s likely you need to use the NVJITLINK_INPUT_LTOIR input type. There is also NVJITLINK_INPUT_ANY, which lets nvJitLink detect the type automatically. Please refer here for the full list of input types.

mferreravila · August 22, 2024, 6:58pm

As a follow up note, we realize that the description of each input type is missing in the documentation; we will improve the description to make it clear when to use each one.

Thank you for bringing this to our attention

Topic		Replies	Views
Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization Technical Blog	16	1491	September 6, 2024
New Features in CUDA 7.5 Technical Blog	66	1086	August 10, 2016
CUDA Toolkit 12.0 Released for General Availability Technical Blog	0	573	December 12, 2022
How to use nvrtc && nvjit? CUDA Programming and Performance cuda	3	87	August 30, 2024
Discovering New Features in CUDA 11.4 Technical Blog	0	506	July 27, 2021
CUDA Toolkit 3.2 release candidate available to registered developers CUDA Programming and Performance	68	63110	December 3, 2010
CUDA Toolkit and SDK 2.3 betas available to registered developers CUDA Programming and Performance	60	104573	July 22, 2009
CUDA 11 Features Revealed Technical Blog	4	665	October 16, 2024
Runtime Fatbin Creation Using the NVIDIA CUDA Toolkit 12.4 Compiler Technical Blog	2	115	June 18, 2024
CUDA + user scripting (e.g. Lua) CUDA Programming and Performance	33	8901	November 16, 2010

CUDA 12.0 Compiler Support for Runtime LTO Using nvJitLink Library

Related topics