Nvcc --device-link for multiple architectures

catcondo · October 31, 2022, 6:18pm

I have a question about the device side linker and and how multiple architectures are handled. Currently I am compiling code for three architectures using the following flags:

nvcc -gencode 'arch=compute_75,code="compute_75,sm_75"' -gencode 'arch=compute_70,code="compute_70,sm_70"' -gencode 'arch=compute_60,code="compute_60,sm_60"' . . .

As for linking, I am confused by the following bolded text found in the CUDA documentation (see chapter 6. Using Separate Compilation in CUDA, section 6.4):

Note that all desired target architectures must be passed to the device linker, as that specifies what will be in the final executable (some objects or libraries may contain device code for multiple architectures, and the link step can then choose what to put in the final executable).

When I specify all desired architectures for device-link several warning messages appear, stating that only the last specified architecture is being considered.

nvcc --gpu-architecture=compute_60 --gpu-architecture=compute_70 --gpu-architecture=compute_75 --device-link . . .
nvcc warning : incompatible redefinition for option 'gpu-architecture', the last value of this option was used
nvcc warning : incompatible redefinition for option 'gpu-architecture', the last value of this option was used

Is there a way to do all 3 architectures in a single link, or should I perform 3 separate links, one for each architecture? Thanks.

Robert_Crovella · October 31, 2022, 6:31pm

Why not just use the same arch specification format for both?

$ cat a.cu
__device__ int f(){ return 1;}
$ cat b.cu
#include <cstdio>

__device__ int f();

__global__ void k(){
  printf("val  = %d\n", f());
}

int main(){

  k<<<1,1>>>();
  cudaDeviceSynchronize();
}
$ nvcc -dc -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_60,code=sm_60 a.cu b.cu
$ nvcc --device-link -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_60,code=sm_60 a.o b.o -o ab.o
$

If it were me, I would not bother to generate 3 different versions of PTX, but to each their own.

catcondo · October 31, 2022, 7:14pm

Thanks for the fast and helpful response. I tried the above as you suggested, and it worked well. However, it did issue several warnings like the following:

nvlink warning : Stack size for entry function '_Z14EvaluateTokensiiPP5Token' cannot be statically determined (target: sm_75)
nvlink warning : Stack size for entry function '_Z14EvaluateTokensiiPP5Token' cannot be statically determined (target: sm_70)
nvlink warning : Stack size for entry function '_Z14EvaluateTokensiiPP5Token' cannot be statically determined (target: sm_60)

where that function EvaluateTokens is a __global__ method that is intended to make some recursive calls. I suppressed the warnings by adding this option to the device link step:

-Xnvlink='-suppress-stack-size-warning'

Robert_Crovella · October 31, 2022, 7:23pm

The warning doesn’t have anything to do with the compilation/linking method. This may be of interest.

catcondo · October 31, 2022, 7:45pm

Understood. I just realized that my compilation step has always had --disable-warnings , so I was oblivious to this warning ever since first developing that particular bit of code. Thanks again.

system · November 14, 2022, 7:45pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Link failure with Thrust and separate compilation CUDA Programming and Performance	4	2234	January 21, 2018
How to debug "Invalid memory reference" while generating linker CUDA Programming and Performance	17	5204	August 21, 2014
nvlink error: "Multiple definition" errors when linking to the same library twice CUDA Programming and Performance	1	5157	July 4, 2018
nvcc compiler warning 'compute_20'.......... CUDA Setup and Installation	17	32101	February 2, 2018
How can I make a PTX fat binary from individual PTX files? CUDA Programming and Performance	4	372	May 11, 2024
Separate Compilation and Linking of CUDA C++ Device Code Technical Blog	39	1775	September 8, 2019
Problem with libraries(?) during building. CUDA Setup and Installation	4	1315	September 14, 2014
Linking errors after upgrading to CUDA 5.0 CUDA Programming and Performance	9	2571	October 23, 2014
Separate compilation of mixed CUDA OpenACC code nvc, nvc++ and nvfortran	2	1104	October 22, 2021
Error "max regcount of 64" CUDA NVCC Compiler	3	95	July 31, 2024

Nvcc --device-link for multiple architectures

Related topics