Cuda on Linux. I have a Cuda file with a very large function. Compiling that Cuda code dominates my build time when I’m able to run “make -j 8” so make can run 8 g++ compiles at a time for the rest of my program. I set up nvcc with the standard flags
-gencode=arch=compute_20,code=sm_20
-gencode=arch=compute_30,code=sm_30
-gencode=arch=compute_35,code=sm_35
-gencode=arch=compute_35,code=compute_35
to support multiple GPU architectures. But this results in one nvcc compile, which then generates the code for these four architectures one at a time. Is there any way to tell nvcc to run these four code generations in parallel, given I have enough CPU cores available?