Does the JIT compiler perform device link-time optimization?

I have now heard back from a member of the NVIDIA driver team:

Prior to the driver version released with CUDA Toolkit 12.0, the driver would JIT the highest arch available, regardless of whether it was PTX or LTO NVVM-IR. However, JIT compilation of NVVM was not guaranteed to be forward compatible with later architectures (this could cause applications to fail with a “device kernel image is invalid” CUDA error).

Therefore, starting the with the CUDA 12.0 driver, the driver will only JIT the highest PTX available, i.e. it will not JIT NVVM code.