Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization

Originally published at: Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization | NVIDIA Developer Blog

CUDA 11.2 features the powerful link time optimization (LTO) feature for device code in GPU-accelerated applications. Device LTO brings the performance advantages of device code optimization that were only possible in the nvcc whole program compilation mode to the nvcc separate compilation mode, which was introduced in CUDA 5.0.Separate compilation mode allows CUDA device kernel…

Figure 2 seems to be wrong, it’s the same as Figure 1. Also it would be nice to get the figures in a higher resolution.

@rkobus – Sorry about that! It’s fixed now. Hope the larger size helps as well. Thanks for the feedback!