enable_language(CUDA) ignores NVCC Compiler flags

Peter_Mayr · March 28, 2018, 6:50am

Hello everybody,

I am trying to include some CUDA-Code in an existing project. It seems that the CMAKE option enable_language(CUDA) leads to some troubles in setting NVCC compiler flags.

With this CMAKE script everything compiles well:

project(projectName)
set(CUDA_SEPARABLE_COMPILATION ON)
#enable_language(CUDA)
get_filename_component(CUDA_LIB_PATH ${CUDA_CUDART_LIBRARY} DIRECTORY)
find_library(CUDA_cudadevrt_LIBRARY cudadevrt PATHS ${CUDA_LIB_PATH})

include_directories("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include")




cuda_add_executable(
    ${PROJECT_NAME}
    main.cpp
    cuda_functions.cu
    cuda_functions.hu
    helpers.hpp
    cHeader.hpp
    foobar.cu
    structs.hpp
)

target_link_libraries(${PROJECT_NAME} ${CUDA_cudadevrt_LIBRARY})

But if i enable the language support I get the error " calling a global function from a global function is only allowed on the compute_35 architecture or above"

project(projectName)
set(CUDA_SEPARABLE_COMPILATION ON)
enable_language(CUDA)
get_filename_component(CUDA_LIB_PATH ${CUDA_CUDART_LIBRARY} DIRECTORY)
find_library(CUDA_cudadevrt_LIBRARY cudadevrt PATHS ${CUDA_LIB_PATH})

include_directories("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include")




add_executable(
    ${PROJECT_NAME}
    main.cpp
    cuda_functions.cu
    cuda_functions.hu
    helpers.hpp
    cHeader.hpp
    foobar.cu
    structs.hpp
)

target_link_libraries(${PROJECT_NAME} ${CUDA_cudadevrt_LIBRARY})

The NVCC flags are set in an extra file:

set(CUDA_NVCC_FLAGS "-o3")
set(CUDA_NVCC_FLAGS "-arch=sm_52")

The project is build for Visual Studio 14 2015 Win64.
The project is build for Nvidia Quadro M2000.
My CMake Version is 3.10.2
I am using Nvidia GPU Computing Toolkit V8.0.

cbuchner1 · March 28, 2018, 4:34pm

I think to extend CUDA_NVCC_FLAGS you have to do something like

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -O3")
set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -arch=sm_52")

In your example the arch=sm_52 option would overwrite (erase) the previous o3 flag.

Also note that the o3 from your posting is not a valid option, as it has to be an upper case O3.

And what’s that “extra file” and how do you make sure it’s being considered?

Christian

opengpu · August 10, 2023, 3:18am

what’s “-O3” stand for?

Robert_Crovella · August 10, 2023, 3:29am

the correct flag is -O3 as indicated in the post by cbuchner:

njuffa · August 10, 2023, 4:06am

-O3 stands for “optimization level 3”, which in common usage over the past twenty years or so denotes the highest optimization level offered by many tool chains.

opengpu · August 10, 2023, 4:34am

so no matter using Ubuntu make, CMake; or vs2019 Windows; set -O3 for debug and release will give the best performance, and in common should be set. Right?
Thanks!

njuffa · August 10, 2023, 4:46am

With modern compilers using full optimization for debug builds makes for a very frustrating experience, as setting breakpoints and watchpoints as well as single-stepping becomes close to meaningless.

For this reason, nvcc turns off all optimizations when a debug build is specified with -G. If you look at the generated machine code, (SASS) you may even see stretches of code that appear to use “pessimizations” that are presumably used to provide for the reliable inspection of all variable at all times. MSVC provides a similar switch: /Od (Disable (Debug)) | Microsoft Learn

For the optimization flags, you would want to consult the nvcc manual. As I recall -O3 acts only on the host compiler, not on the device compiler. The device compiler is actually a tandem of two optimizing compiler: The LLVM-based frontend produces PTX, and the ptxas backend compiles PTX to SASS. The default optimization level for the ptxas component is -O3 for release builds, but programmers can specify something else with -Xptxas -O{0|1|2}.

To control the host compiler, I usually use the -Xcompiler flag of nvcc to specify exactly what I want, which in the case of MSVC is typically something like -Xcompiler "/W3 /O2 /arch:{AVX|AVX2|AVX512} /favor:INTEL64 /fp:precise"

Topic		Replies	Views
How to do -O3 optimization in visual Studio for CUDA code CUDA Programming and Performance	6	7815	July 23, 2015
nvcc optimization flags CUDA Programming and Performance	6	19076	April 29, 2019
Is O3 always good option in nvcc ? compiling with nvcc, when there is no error at least... CUDA Programming and Performance	2	18921	July 6, 2011
SOLVED? nvcc optimization options problem CUDA Programming and Performance	5	7125	July 15, 2010
Difference in Performance CUDA Programming and Performance	13	9737	August 20, 2008
nvcc -O3 problem CUDA Programming and Performance	7	8064	October 22, 2011
Float16 is not defined CUDA Programming and Performance	2	946	August 2, 2022
Nvcc compile fails on adding tiny-cuda-nn OptiX cuda	6	1095	March 29, 2024
Compiling CUDA through CMAKE in my project? CUDA Programming and Performance	4	6263	February 11, 2011
Compilation flags help CUDA Programming and Performance	8	1811	November 10, 2016

enable_language(CUDA) ignores NVCC Compiler flags

Related topics