Development using only runtime CUDA compilation (nvrtc) vs compile time CUDA compilation (nvcc)

My organization produces engineering simulation software with GPU acceleration. We have recently decided to start using CUDA and we are investigating the best way to integrate it into our build system. As i see it we have 2 options:

  1. Compile all of our device code at runtime using the nvrtc library. I think this would involve very few modifications to our build and distribution systems other than adding the necessary new libraries. It would also allow optimisations involving dynamic code generation.
  2. Introduce nvcc into our build system to allow compile time compilation of CUDA code. This would involve some larger changes to our build system, but it might be worth it if it brings advantages for development.

My question is what benefits/disadvantages there would be with adding compile time CUDA compilation (possibly using a mixed approach using nvrtc for some things)? My preliminary thoughts are the following, which might be misguided:

  1. Compile time CUDA compilation allows less boilerplate code for setting up and running gpu kernels, and allows 'nice' features like being able to mix the host and device code in the same file.
  2. Some template libraries like thrust seem to be designed to only work with compile-time CUDA compilation.
  3. The nvrtc library was only introduced with the CUDA Toolkit 7.0 in 2015, whereas CUDA itself has existed since 2007. Most of the examples in the cuda samples do not use nvrtc. Does this suggest that using the compile time method is the 'classic' way to use CUDA and that nvrtc is only meant for some add-on situations like dynamic code generation?
  4. As far as i can tell all of the debugging/profiling tools which work for compile time CUDA compilation should also work when using nvrtc, but just want to be sure of this.

Any thoughts on these issues or other benefits/disadvantages of compile time CUDA compilation would be much appreciated.

You should probably try out both, and learn to use both, before making any decisions about development paths. I would say that nvrtc is noticeably harder to use for many practical examples.

This additional difficulty led to the creation of support systems like jitify:

To wit:

"Integrating NVRTC into existing and/or templated CUDA code can be tricky. "

Of course, if you need to do runtime compilation, then nvrtc is a sensible choice.

If it were me, I wouldn’t use nvrtc unless I needed to.

Thanks for your reply. I should have said that we almost certainly will need to use nvrtc because we want to support user defined plugin compilation. The problem is more about whether to also add compile time CUDA compilation and what benefits that would give us.