Can nvcc generate code for multiple architectures in parallel

REPoore · April 7, 2014, 10:32pm

Cuda on Linux. I have a Cuda file with a very large function. Compiling that Cuda code dominates my build time when I’m able to run “make -j 8” so make can run 8 g++ compiles at a time for the rest of my program. I set up nvcc with the standard flags

-gencode=arch=compute_20,code=sm_20
-gencode=arch=compute_30,code=sm_30
-gencode=arch=compute_35,code=sm_35
-gencode=arch=compute_35,code=compute_35

to support multiple GPU architectures. But this results in one nvcc compile, which then generates the code for these four architectures one at a time. Is there any way to tell nvcc to run these four code generations in parallel, given I have enough CPU cores available?

njuffa · April 8, 2014, 12:04am

As far as I know, there is no way to parallelize the building of fat binaries when using a single nvcc invocation, with nvcc assigning the build for each target architecture to a different thread. Strikes me as an excellent suggestion.

I would suggest filing an enhancement request via the bug reporting form linked from the registered developer website. Please prefix the synopsis with “RFE:” so it is readily recognizable as a request for enhancement rather than a true bug. Thanks!

tera · April 19, 2014, 8:47pm

In the meantime you should be able to manually parallelize the build by looking at the output of nvcc --verbose or nvcc --dryrun and invoking those commands directly instead of nvcc.

REPoore · April 22, 2014, 8:27pm

Njuffa: Bug 1504822 submitted as an enhancement request.

Tera: Thanks for the idea, but too ugly for my production environment.

AaronMS · February 4, 2025, 11:48pm

Just in case anyone else comes across this, there is now an nvcc --threads 0 option to use as many threads as available CPUs.

Topic		Replies	Views
Feature Request - More build architectures per build CUDA Programming and Performance	4	1018	April 16, 2013
Does nvcc support compiling in parallel? CUDA Programming and Performance	5	3640	August 3, 2022
Understanding code optimization resulting from the --gpu-architecture, --gpu-code and --generate-code flags CUDA NVCC Compiler	1	1254	May 31, 2024
Specifying multiple architectures on Visual Studio, JIT CUDA Setup and Installation	3	3536	September 10, 2024
Cmake and and Heterogenious GPUs CUDA Programming and Performance	12	12177	September 27, 2010
Parallize NVCC source file compilation across CPU cores ? CUDA Programming and Performance	0	833	June 22, 2011
How can I make a PTX fat binary from individual PTX files? CUDA Programming and Performance	4	502	May 11, 2024
nvcc compilation time CUDA Programming and Performance	2	3143	April 12, 2011
nvcc cubin for multiple platforms How can I produce CUBIN for all platforms? CUDA Programming and Performance	4	2534	January 8, 2011
Compile cuda code for all(or most) nvidia GPUs? CUDA NVCC Compiler cuda	1	551	September 17, 2022

Can nvcc generate code for multiple architectures in parallel

Related topics