What happens when no arch flags passed by CMAKE

orcdnz · April 2, 2024, 12:14am

Normally I use below to add pass gencode to compiler.

cuda_select_nvcc_arch_flags(ARCH_FLAGS Auto) 
list(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})

However when I want to activate link time optimizations this throws

nvcc fatal   : '-dlto' conflicts with '-gencode' to control what is generated; use 'code=lto_<arch>' with '-gencode' instead of '-dlto' to request lto intermediate
nvcc fatal   : '-dlto' conflicts with '-gencode' to control what is generated; use 'code=lto_<arch>' with '-gencode' instead of '-dlto' to request lto intermediate

Now I’m wondering 2 things.

What happens if i dont set any kinde of arch info manually anywhere? Does cmake or nvcc still automatically find my GPU’s arch bc I still get working executables and libs.
How can I activate LTO by using cuda_select_nvcc_arch_flags(ARCH_FLAGS Auto) or any other consice and generic method.

njuffa · April 2, 2024, 1:48am

Every version of nvcc has a built-in default target architecture. One way to find out (other than reading the documentation) is to inspect the output from building with nvcc -v, i.e. a verbose build. For example, for the CUDA 12.3 toolchain this shows -D__CUDA_ARCH__=520, -arch compute52, and --arch=sm52 being passed to various components of the toolchain (remember that nvcc is just the driver program which invokes these components under the hood), so we can conclude with confidence that the default architecture target for CUDA12.3 is compute capability 5.2.

If you specify -arch=native on the nvcc commandline, it will iterate over all visible (see CUDA_VISIBLE_DEVICES) GPUs in your system and add code for each architecture found to the fat binary.

I consider questions about Cmake off-topic here, as it is not a tool shipped or created by NVIDIA, and I neither use nor endorse it. Cmake users among forum participants may be able to provide those details.

orcdnz · April 3, 2024, 10:17am

Yes I could see that is also the case for me. But eventhough my arch isn’t 5.2(it is 8.6) the program runs smoothly. So I wonder what is the exact effects of this architecture specification.

njuffa · April 3, 2024, 2:19pm

If the CUDA runtime cannot find a binary image that matches the compute capability of the GPU present, it will look for suitable PTX and JIT-compile it. Depending on the amount of the device code that needs to be translated that could cause a noticeable delay. If neither a suitable binary image nor suitable PTX is available, kernel execution fails. This is described in the CUDA documentation.

It is a best practice with CUDA to build a fat binary that contains SASS (machine code) for all GPU architectures that need to be supported, plus PTX for the latest GPU architecture for forward compatibility (this code can be JIT compiled). If you do that manually, it might look like this:

-gencode=arch=compute_50,code=sm_50 \  
-gencode=arch=compute_52,code=sm_52 \  
-gencode=arch=compute_60,code=sm_60 \  
-gencode=arch=compute_61,code=sm_61 \  
-gencode=arch=compute_70,code=sm_70 \  
-gencode=arch=compute_75,code=sm_75 \ 
-gencode=arch=compute_80,code=sm_80 \ 
-gencode=arch=compute_86,code=sm_86 \ 
-gencode=arch=compute_89,code=sm_89 \
-gencode=arch=compute_90,code=sm_90 \ 
-gencode=arch=compute_90,code=compute_90

nvcc also offers shortcuts in the form of command-line switches -arch=all and -arch=all-major. See compiler documentation.

Topic		Replies	Views
Using dlink-time-opt together with gencode in CMAKE CUDA Programming and Performance	4	2422	May 10, 2021
How to check the Version of GPU to dynamically set '-gencode=arch=compute_?'? CUDA Programming and Performance	19	4453	August 14, 2023
How to use CUDA 7.5 on gtx 1080 ti? CUDA Programming and Performance	6	2719	September 5, 2018
NVCC fatal error, make: *** [cudaobj/Debug/fkt_alles_cuda.o] Error 1 - Solved CUDA Setup and Installation	4	1613	February 4, 2019
Cmake and and Heterogenious GPUs CUDA Programming and Performance	12	12042	September 27, 2010
enable_language(CUDA) ignores NVCC Compiler flags CUDA Programming and Performance	6	5371	August 10, 2023
what does -arch and -code flags do? CUDA Programming and Performance	2	2229	June 26, 2009
Slow compile and cudaMalloc CUDA Programming and Performance	8	3712	February 2, 2011
conditional nvcc-compiling CUDA Programming and Performance	5	7725	October 1, 2008
mex function not Cuda kernel CUDA Programming and Performance	3	1228	April 16, 2012

What happens when no arch flags passed by CMAKE

Related topics