I am trying to include some CUDA-Code in an existing project. It seems that the CMAKE option enable_language(CUDA) leads to some troubles in setting NVCC compiler flags.
But if i enable the language support I get the error " calling a global function from a global function is only allowed on the compute_35 architecture or above"
The project is build for Visual Studio 14 2015 Win64.
The project is build for Nvidia Quadro M2000.
My CMake Version is 3.10.2
I am using Nvidia GPU Computing Toolkit V8.0.
-O3 stands for “optimization level 3”, which in common usage over the past twenty years or so denotes the highest optimization level offered by many tool chains.
so no matter using Ubuntu make, CMake; or vs2019 Windows; set -O3 for debug and release will give the best performance, and in common should be set. Right?
Thanks!
With modern compilers using full optimization for debug builds makes for a very frustrating experience, as setting breakpoints and watchpoints as well as single-stepping becomes close to meaningless.
For this reason, nvcc turns off all optimizations when a debug build is specified with -G. If you look at the generated machine code, (SASS) you may even see stretches of code that appear to use “pessimizations” that are presumably used to provide for the reliable inspection of all variable at all times. MSVC provides a similar switch: /Od (Disable (Debug)) | Microsoft Learn
For the optimization flags, you would want to consult the nvcc manual. As I recall -O3 acts only on the host compiler, not on the device compiler. The device compiler is actually a tandem of two optimizing compiler: The LLVM-based frontend produces PTX, and the ptxas backend compiles PTX to SASS. The default optimization level for the ptxas component is -O3 for release builds, but programmers can specify something else with -Xptxas -O{0|1|2}.
To control the host compiler, I usually use the -Xcompiler flag of nvcc to specify exactly what I want, which in the case of MSVC is typically something like -Xcompiler "/W3 /O2 /arch:{AVX|AVX2|AVX512} /favor:INTEL64 /fp:precise"