enable_language(CUDA) ignores NVCC Compiler flags

Hello everybody,

I am trying to include some CUDA-Code in an existing project. It seems that the CMAKE option enable_language(CUDA) leads to some troubles in setting NVCC compiler flags.

With this CMAKE script everything compiles well:

project(projectName)
set(CUDA_SEPARABLE_COMPILATION ON)
#enable_language(CUDA)
get_filename_component(CUDA_LIB_PATH ${CUDA_CUDART_LIBRARY} DIRECTORY)
find_library(CUDA_cudadevrt_LIBRARY cudadevrt PATHS ${CUDA_LIB_PATH})

include_directories("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include")




cuda_add_executable(
    ${PROJECT_NAME}
    main.cpp
    cuda_functions.cu
    cuda_functions.hu
    helpers.hpp
    cHeader.hpp
    foobar.cu
    structs.hpp
)

target_link_libraries(${PROJECT_NAME} ${CUDA_cudadevrt_LIBRARY})

But if i enable the language support I get the error " calling a global function from a global function is only allowed on the compute_35 architecture or above"

project(projectName)
set(CUDA_SEPARABLE_COMPILATION ON)
enable_language(CUDA)
get_filename_component(CUDA_LIB_PATH ${CUDA_CUDART_LIBRARY} DIRECTORY)
find_library(CUDA_cudadevrt_LIBRARY cudadevrt PATHS ${CUDA_LIB_PATH})

include_directories("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include")




add_executable(
    ${PROJECT_NAME}
    main.cpp
    cuda_functions.cu
    cuda_functions.hu
    helpers.hpp
    cHeader.hpp
    foobar.cu
    structs.hpp
)

target_link_libraries(${PROJECT_NAME} ${CUDA_cudadevrt_LIBRARY})

The NVCC flags are set in an extra file:

set(CUDA_NVCC_FLAGS "-o3")
set(CUDA_NVCC_FLAGS "-arch=sm_52")

The project is build for Visual Studio 14 2015 Win64.
The project is build for Nvidia Quadro M2000.
My CMake Version is 3.10.2
I am using Nvidia GPU Computing Toolkit V8.0.

I think to extend CUDA_NVCC_FLAGS you have to do something like

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -O3")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -arch=sm_52")

In your example the arch=sm_52 option would overwrite (erase) the previous o3 flag.

Also note that the o3 from your posting is not a valid option, as it has to be an upper case O3.

And what’s that “extra file” and how do you make sure it’s being considered?

Christian

what’s “-O3” stand for?

the correct flag is -O3 as indicated in the post by cbuchner:

-O3 stands for “optimization level 3”, which in common usage over the past twenty years or so denotes the highest optimization level offered by many tool chains.

so no matter using Ubuntu make, CMake; or vs2019 Windows; set -O3 for debug and release will give the best performance, and in common should be set. Right?
Thanks!

With modern compilers using full optimization for debug builds makes for a very frustrating experience, as setting breakpoints and watchpoints as well as single-stepping becomes close to meaningless.

For this reason, nvcc turns off all optimizations when a debug build is specified with -G. If you look at the generated machine code, (SASS) you may even see stretches of code that appear to use “pessimizations” that are presumably used to provide for the reliable inspection of all variable at all times. MSVC provides a similar switch: /Od (Disable (Debug)) | Microsoft Learn

For the optimization flags, you would want to consult the nvcc manual. As I recall -O3 acts only on the host compiler, not on the device compiler. The device compiler is actually a tandem of two optimizing compiler: The LLVM-based frontend produces PTX, and the ptxas backend compiles PTX to SASS. The default optimization level for the ptxas component is -O3 for release builds, but programmers can specify something else with -Xptxas -O{0|1|2}.

To control the host compiler, I usually use the -Xcompiler flag of nvcc to specify exactly what I want, which in the case of MSVC is typically something like -Xcompiler "/W3 /O2 /arch:{AVX|AVX2|AVX512} /favor:INTEL64 /fp:precise"