Reducing Application Build Times Using CUDA C++ Compilation Aids

Originally published at: https://developer.nvidia.com/blog/reducing-application-build-times-using-cuda-c-compilation-aids/

This technical walkthrough on the CUDA C++ compiler toolchain complements the programming guide and provides a broad overview of new features being introduced in the CUDA 11.5 toolkit release.

Hello authors,

First, thank you for recognizing the importance of compilation times, particularly dynamic compilation times. This can be quite critical in systems whose computational workloads change depending on user input.

Second, some general feedback:

  1. When you compile anything in NVRTC, you guys are pulling in this huge header, __nv_nvrtc_builtin_header.h . It has more than 140000 lines! It must have an atrocious effect on compilation time. You really must make it optional; and probably break it up into multiple include files. Let us decide whether we want any of that at all, or just parts of it, or none of it.

  2. I hope you’re working on this issue for OpenCL compilation as well as for CUDA.

  3. You guys should open-source libnvrtc, and probably the CUDA driver. The best thing in terms of allowing customization and maximizing performance is to make that transparent and manipulable by us. NVIDIA is a hardware company, don’t distribute so much closed-source software.

Now for some section-specific feedback:

NVRTC concurrent compilation

You write:

Some of these stages are not thread-safe

Why? There’s no reason they shouldn’t be. Now, sure, it’s much better that you’ve broken the faux critical section into 3 critical sections, but why not just make it thread-safe?

PTX concurrency compilation

You write:

PTX compilation … proceeds through multiple internal phases. The previous implementation … the PTX compiler used a global lock to serialize concurrent compilations … In CUDA 11.5 and the R495 driver, the PTX compiler implementation now uses finer-grained local locks

Again, why should it lock anything?

Eliminating unused kernels

Neat :-)

Pragma diagnostic control

Can you please put up, and maintain, a complete list of all of the warnings/errors supported by NVCC and by NVRTC, and their numbers?