Originally published at: https://developer.nvidia.com/blog/cuda-12-0-compiler-support-for-runtime-lto-using-nvjitlink-library/
CUDA Toolkit 12.0 introduces a new nvJitLink library for Just-in-Time Link Time Optimization (JIT LTO) support.
Hi,
some problems have annoyed me,like following statement:
"JIT LTO minimizes the impact on binary size by enabling the cuFFT library to build LTO optimized speed-of-light (SOL) kernels for any parameter combination, at runtime. This is achieved by shipping the building blocks of FFT kernels instead of specialized FFT kernels. "
can you explain what āthe building blocks of FFT kernelsā meansļ¼
Thanks
Thanks for the question. I am not the FFT developer, but in general what they have done is decompose their algorithm into individual pieces. Previously the library was very large because they provided all permutations of an algorithm. Now they just have a handful of building blocks which they can combine into a specific permutation at runtime. They gave a GTC talk about the work they have done which has some more details.
Looking at cuFFTDx library (C++ header only) can give good insight on what can be considered FFT building blocks. Bit more summarized view from another point of view would be SIAM PP22 presentation (slide 10).
Links:
hiļ¼
nvJitLinkAddData(handle, NVJITLINK_INPUT_CUBIN, (void*)cubin_s.c_str(),
cubin_s.size(), "program_rtc.cubin"));
but I get a error: ERROR 1: bad input: does not match type NVJITLINK_INPUT_CUBINprogram_rtc.cubin
Please help me @lligowski
Hi qiji,
This error indicates that the type of input passed to nvJitLinkAddData (NVJITLINK_INPUT_*
) does not match the format found in the data array ((void*)cubin_s.c_str()
in your case).
If you compiled the device code with NVRTC, itās likely you need to use the NVJITLINK_INPUT_LTOIR
input type. There is also NVJITLINK_INPUT_ANY
, which lets nvJitLink detect the type automatically. Please refer here for the full list of input types.
As a follow up note, we realize that the description of each input type is missing in the documentation; we will improve the description to make it clear when to use each one.
Thank you for bringing this to our attention