how to check the Version of GPU to dynamically set ‘-gencode=arch=compute_??’ ?
is it able to not fix the flags but dynamically find out the version of GPU in different PC?
thanks
The standard approach with CUDA is to build a fat binary that contains SASS (machine code) for all GPU architectures that need to be supported, plus PTX for the latest GPU architecture for forward compatibility (this code can be JIT compiled).
This approach requires no prior knowledge of the GPUs in the system(s) the software will run on, nor does it require the build system to contain a GPU.
u mean set “-gencode=arch=compute_??” to the lowest version of possible GPU?
will it give the best performance for the newest version of GPU?
At present, a maximally fat binary might be generated along these lines:
-gencode=arch=compute_50,code=sm_50 \
-gencode=arch=compute_52,code=sm_52 \
-gencode=arch=compute_60,code=sm_60 \
-gencode=arch=compute_61,code=sm_61 \
-gencode=arch=compute_70,code=sm_70 \
-gencode=arch=compute_75,code=sm_75 \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_86,code=sm_86 \
-gencode=arch=compute_89,code=sm_89 \
-gencode=arch=compute_90,code=sm_90 \
-gencode=arch=compute_90,code=compute_90
So all architectures from Maxwell to Hopper are covered by SASS and we include PTX for Hopper. Most applications probably do not require architecture coverage this extensive.
For recent CUDA releases, you can use -arch=native
to compile for all visible devices in the machine (all devices by default, can be explicitly specified with standard CUDA_VISIBLE_DEVICES environment variable.)
For older CUDA versions, you could write a helper program that detects the architecture of all visible devices and outputs the corresponding nvcc flag to use.
thanks!
since which version belong to the ‘recent’ one?
for the older versions, is there any demo or tutorial for this? eg. CUDA-11.1
This was introduced with CUDA 11.5 update 1, per official documentation:
1.1.4. New -arch=native option
In addition to the
-arch=all
and-arch=all-major
options added in CUDA 11.5, NVCC introduced-arch= native
in CUDA 11.5 update 1. This -arch=native option is a convenient way for users to let NVCC determine the right target architecture to compile the CUDA device code to based on the GPU installed on the system. This can be particularly helpful for testing when applications are run on the same system they are compiled in.
To determine the architectures manually, enumerate the devices and query their respective architecture major version and minor version.
is it able to do this in CMake? If so, fat binary mentioned above can be avoided.
I am not familiar with cmake. You can easily write your own program that does this with only a few lines of code.
is this a type-error?
No. As I stated, for a fat binary the standard approach is to deposit, in addition to SASS for all supported architectures, PTX intermediate code for the latest architecture, in this case compute capability 9.0. In this way the code will continue to work when GPUs of a future architecture appear.
Compare the official documentation. I think it refers to sm_XX
as a real architecture and compute_XX
as a virtual architecture.
If you want to double check what winds up in the fat binary, you can use the cuobjdump
switches --dump-ptx
and --dump-sass
.
thanks!
isn’t ‘cuda_select_nvcc_arch_flags’ more standard and official?
(0) What is cuda_select_nvcc_arch_flags
?
(1) More standard than what?
(2) Officially recommended by whom?
CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable [target_CUDA_architectures])
– Selects GPU arch flags for nvcc based on target_CUDA_architectures
target_CUDA_architectures : Auto | Common | All | LIST(ARCH_AND_PTX …)
- “Auto” detects local machine GPU compute arch at runtime.
- “Common” and “All” cover common and entire subsets of architectures
ARCH_AND_PTX : NAME | NUM.NUM | NUM.NUM(NUM.NUM) | NUM.NUM+PTX
NAME: Fermi Kepler Maxwell Kepler+Tegra Kepler+Tesla Maxwell+Tegra Pascal
NUM: Any number. Only those pairs are currently accepted by NVCC though:
2.0 2.1 3.0 3.2 3.5 3.7 5.0 5.2 5.3 6.0 6.2
Returns LIST of flags to be added to CUDA_NVCC_FLAGS in ${out_variable}
Additionally, sets ${out_variable}_readable to the resulting numeric list
Example:
CUDA_SELECT_NVCC_ARCH_FLAGS(ARCH_FLAGS 3.0 3.5+PTX 5.2(5.0) Maxwell)
LIST(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})
That link seems to be to some third-party tool I do not know about, that is, not something provided and/or maintained by NVIDIA. Above I provided advice what to add to the nvcc
command line to build a fat binary. How you generate that command line is entirely up to you, but my general advice is to use a single makefile for simple projects.
If you run into issues with a third-party tool, I would strongly suggest reading the documentation for that tool and availing yourself of the support infrastructure for that tool. This could be a mailing list, online forum, chat group, etc.
CMake isn’t a product that is produced by NVIDIA
target_compile_options(target PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:
-arch=native;
-code=??#still need to set -code??
)