How to check the Version of GPU to dynamically set '-gencode=arch=compute_?'?

opengpu · July 27, 2023, 6:29am

how to check the Version of GPU to dynamically set ‘-gencode=arch=compute_??’ ?
is it able to not fix the flags but dynamically find out the version of GPU in different PC?
thanks

njuffa · July 27, 2023, 6:43am

The standard approach with CUDA is to build a fat binary that contains SASS (machine code) for all GPU architectures that need to be supported, plus PTX for the latest GPU architecture for forward compatibility (this code can be JIT compiled).

This approach requires no prior knowledge of the GPUs in the system(s) the software will run on, nor does it require the build system to contain a GPU.

opengpu · July 27, 2023, 6:56am

u mean set “-gencode=arch=compute_??” to the lowest version of possible GPU?
will it give the best performance for the newest version of GPU?

njuffa · July 27, 2023, 7:12am

At present, a maximally fat binary might be generated along these lines:

-gencode=arch=compute_50,code=sm_50 \  
-gencode=arch=compute_52,code=sm_52 \  
-gencode=arch=compute_60,code=sm_60 \  
-gencode=arch=compute_61,code=sm_61 \  
-gencode=arch=compute_70,code=sm_70 \  
-gencode=arch=compute_75,code=sm_75 \ 
-gencode=arch=compute_80,code=sm_80 \ 
-gencode=arch=compute_86,code=sm_86 \ 
-gencode=arch=compute_89,code=sm_89 \
-gencode=arch=compute_90,code=sm_90 \ 
-gencode=arch=compute_90,code=compute_90

So all architectures from Maxwell to Hopper are covered by SASS and we include PTX for Hopper. Most applications probably do not require architecture coverage this extensive.

striker159 · July 27, 2023, 7:44am

For recent CUDA releases, you can use -arch=native to compile for all visible devices in the machine (all devices by default, can be explicitly specified with standard CUDA_VISIBLE_DEVICES environment variable.)

For older CUDA versions, you could write a helper program that detects the architecture of all visible devices and outputs the corresponding nvcc flag to use.

opengpu · August 3, 2023, 6:22am

thanks!
since which version belong to the ‘recent’ one?
for the older versions, is there any demo or tutorial for this? eg. CUDA-11.1

njuffa · August 3, 2023, 7:03am

This was introduced with CUDA 11.5 update 1, per official documentation:

1.1.4. New -arch=native option

In addition to the -arch=all and -arch=all-major options added in CUDA 11.5, NVCC introduced -arch= native in CUDA 11.5 update 1. This -arch=native option is a convenient way for users to let NVCC determine the right target architecture to compile the CUDA device code to based on the GPU installed on the system. This can be particularly helpful for testing when applications are run on the same system they are compiled in.

striker159 · August 5, 2023, 6:23pm

To determine the architectures manually, enumerate the devices and query their respective architecture major version and minor version.

opengpu · August 6, 2023, 2:50pm

is it able to do this in CMake? If so, fat binary mentioned above can be avoided.

striker159 · August 6, 2023, 6:25pm

I am not familiar with cmake. You can easily write your own program that does this with only a few lines of code.

opengpu · August 7, 2023, 5:43am

is this a type-error?

njuffa · August 7, 2023, 6:09am

No. As I stated, for a fat binary the standard approach is to deposit, in addition to SASS for all supported architectures, PTX intermediate code for the latest architecture, in this case compute capability 9.0. In this way the code will continue to work when GPUs of a future architecture appear.

Compare the official documentation. I think it refers to sm_XX as a real architecture and compute_XX as a virtual architecture.

If you want to double check what winds up in the fat binary, you can use the cuobjdump switches --dump-ptx and --dump-sass.

opengpu · August 7, 2023, 6:15am

thanks!

opengpu · August 11, 2023, 7:16am

isn’t ‘cuda_select_nvcc_arch_flags’ more standard and official?

njuffa · August 11, 2023, 8:10am

(0) What is cuda_select_nvcc_arch_flags?
(1) More standard than what?
(2) Officially recommended by whom?

opengpu · August 11, 2023, 8:38am

cuda_select_nvcc_arch_flags

CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable [target_CUDA_architectures])
– Selects GPU arch flags for nvcc based on target_CUDA_architectures
target_CUDA_architectures : Auto | Common | All | LIST(ARCH_AND_PTX …)

“Auto” detects local machine GPU compute arch at runtime.
“Common” and “All” cover common and entire subsets of architectures
ARCH_AND_PTX : NAME | NUM.NUM | NUM.NUM(NUM.NUM) | NUM.NUM+PTX
NAME: Fermi Kepler Maxwell Kepler+Tegra Kepler+Tesla Maxwell+Tegra Pascal
NUM: Any number. Only those pairs are currently accepted by NVCC though:
2.0 2.1 3.0 3.2 3.5 3.7 5.0 5.2 5.3 6.0 6.2
Returns LIST of flags to be added to CUDA_NVCC_FLAGS in ${out_variable}
Additionally, sets ${out_variable}_readable to the resulting numeric list
Example:
CUDA_SELECT_NVCC_ARCH_FLAGS(ARCH_FLAGS 3.0 3.5+PTX 5.2(5.0) Maxwell)
LIST(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})

njuffa · August 11, 2023, 9:28am

That link seems to be to some third-party tool I do not know about, that is, not something provided and/or maintained by NVIDIA. Above I provided advice what to add to the nvcc command line to build a fat binary. How you generate that command line is entirely up to you, but my general advice is to use a single makefile for simple projects.

If you run into issues with a third-party tool, I would strongly suggest reading the documentation for that tool and availing yourself of the support infrastructure for that tool. This could be a mailing list, online forum, chat group, etc.

opengpu · August 11, 2023, 2:51pm

FindCUDA — CMake 3.7.2 Documentation

thanks

Robert_Crovella · August 11, 2023, 3:35pm

CMake isn’t a product that is produced by NVIDIA

opengpu · August 14, 2023, 3:33am

target_compile_options(target PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:
-arch=native;
-code=??#still need to set -code??

)

Topic		Replies	Views
first install of cuda CUDA Setup and Installation	6	7639	February 12, 2017
CUDA and autotools configure.ac, makefile.am, ... CUDA Programming and Performance	20	21526	November 29, 2012
What happens when no arch flags passed by CMAKE CUDA Programming and Performance	3	597	April 3, 2024
Device code generated from -stdpar versus thrust nvc, nvc++ and nvfortran	12	2453	June 13, 2022
Understanding code optimization resulting from the --gpu-architecture, --gpu-code and --generate-code flags CUDA NVCC Compiler	1	912	May 31, 2024
Determining correct compute capability for a loaded PTX file/kernel ? CUDA Programming and Performance	10	2610	February 11, 2015
Using gprof with CUDA Can this profiler be used with liunx c and CUDA? CUDA Programming and Performance	6	7019	April 9, 2013
How to verify version match of toolkit and driver CUDA Setup and Installation	8	16518	September 21, 2019
Porting to the GPU Any Easy way to Port Code CUDA Programming and Performance	18	17844	October 20, 2009
Can no longer create backward compatible CUDA binary with Titan V and CUDA 9 CUDA Setup and Installation	4	1043	August 2, 2018

How to check the Version of GPU to dynamically set '-gencode=arch=compute_?'?

1.1.4. New -arch=native option

Related topics