My organization produces engineering simulation software with GPU acceleration. We have recently decided to start using CUDA and we are investigating the best way to integrate it into our build system. As i see it we have 2 options:
- Compile all of our device code at runtime using the nvrtc library. I think this would involve very few modifications to our build and distribution systems other than adding the necessary new libraries. It would also allow optimisations involving dynamic code generation.
- Introduce nvcc into our build system to allow compile time compilation of CUDA code. This would involve some larger changes to our build system, but it might be worth it if it brings advantages for development.
My question is what benefits/disadvantages there would be with adding compile time CUDA compilation (possibly using a mixed approach using nvrtc for some things)? My preliminary thoughts are the following, which might be misguided:
- Compile time CUDA compilation allows less boilerplate code for setting up and running gpu kernels, and allows 'nice' features like being able to mix the host and device code in the same file.
- Some template libraries like thrust seem to be designed to only work with compile-time CUDA compilation.
- The nvrtc library was only introduced with the CUDA Toolkit 7.0 in 2015, whereas CUDA itself has existed since 2007. Most of the examples in the cuda samples do not use nvrtc. Does this suggest that using the compile time method is the 'classic' way to use CUDA and that nvrtc is only meant for some add-on situations like dynamic code generation?
- As far as i can tell all of the debugging/profiling tools which work for compile time CUDA compilation should also work when using nvrtc, but just want to be sure of this.
Any thoughts on these issues or other benefits/disadvantages of compile time CUDA compilation would be much appreciated.