Improving GPU Application Performance with NVIDIA CUDA 11.2 Device Link Time Optimization