Nvcc --device-link for multiple architectures

I have a question about the device side linker and and how multiple architectures are handled. Currently I am compiling code for three architectures using the following flags:

nvcc -gencode 'arch=compute_75,code="compute_75,sm_75"' -gencode 'arch=compute_70,code="compute_70,sm_70"' -gencode 'arch=compute_60,code="compute_60,sm_60"' . . .

As for linking, I am confused by the following bolded text found in the CUDA documentation (see chapter 6. Using Separate Compilation in CUDA, section 6.4):

Note that all desired target architectures must be passed to the device linker, as that specifies what will be in the final executable (some objects or libraries may contain device code for multiple architectures, and the link step can then choose what to put in the final executable).

When I specify all desired architectures for device-link several warning messages appear, stating that only the last specified architecture is being considered.

nvcc --gpu-architecture=compute_60 --gpu-architecture=compute_70 --gpu-architecture=compute_75 --device-link . . .
nvcc warning : incompatible redefinition for option 'gpu-architecture', the last value of this option was used
nvcc warning : incompatible redefinition for option 'gpu-architecture', the last value of this option was used

Is there a way to do all 3 architectures in a single link, or should I perform 3 separate links, one for each architecture? Thanks.

Why not just use the same arch specification format for both?

$ cat a.cu
__device__ int f(){ return 1;}
$ cat b.cu
#include <cstdio>

__device__ int f();

__global__ void k(){
  printf("val  = %d\n", f());
}

int main(){

  k<<<1,1>>>();
  cudaDeviceSynchronize();
}
$ nvcc -dc -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_60,code=sm_60 a.cu b.cu
$ nvcc --device-link -gencode arch=compute_75,code=compute_75 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_60,code=sm_60 a.o b.o -o ab.o
$

If it were me, I would not bother to generate 3 different versions of PTX, but to each their own.

1 Like

Thanks for the fast and helpful response. I tried the above as you suggested, and it worked well. However, it did issue several warnings like the following:

nvlink warning : Stack size for entry function '_Z14EvaluateTokensiiPP5Token' cannot be statically determined (target: sm_75)
nvlink warning : Stack size for entry function '_Z14EvaluateTokensiiPP5Token' cannot be statically determined (target: sm_70)
nvlink warning : Stack size for entry function '_Z14EvaluateTokensiiPP5Token' cannot be statically determined (target: sm_60)

where that function EvaluateTokens is a __global__ method that is intended to make some recursive calls. I suppressed the warnings by adding this option to the device link step:

-Xnvlink='-suppress-stack-size-warning'

The warning doesn’t have anything to do with the compilation/linking method. This may be of interest.

1 Like

Understood. I just realized that my compilation step has always had --disable-warnings , so I was oblivious to this warning ever since first developing that particular bit of code. Thanks again.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.