Parallel compilation with NVRTC

I doubt you are doing anything wrong, and I don’t know of anywhere that it is claimed or documented by NVIDIA that NVRTC will run compilation in parallel. You can also find other reports like this one on forums. Furthermore, the general possibility for runtime and driver API calls to interlock or serialize is published.

If you desire a particular capability in CUDA, one way to express that is by filing a bug.