Is it possible for code compiled with the -G flag to be optimized out during device link time optimization?

Hello, I am trying to get some insight as to why I am unable to hit breakpoints when trying to set up the cuda-gdb for quite a large project I am involed in, with many cuda kernels and device code functions. Unfortunately I am not sure how much I am allowed to share of this project, but I will try to describe the relevant parts.

Recently, in our environment we were using CUDA Toolkit 11.1 as well as CUDA Version 11.1. With this setup I was able to compile our cuda source files with the device debug flag -G but unable to hit breakpoints set in a kernel or device function, it simply said “no variables available” in my CLion debugger variable window (Note: I am using a CLion version >= 2022.2, which should have cuda-gdb support).

But recently we upgraded both the CUDA Toolkit and CUDA version to 11.6. With this new version I am no longer able to link the device code when using the -G flag, this leads to segmentation fault, but compiling works fine.

Here are some hopefully relevant details that I use during compilation/linking:

CMAKE_CUDA_SEPARABLE_COMPILATION ON
-gencode=arch=compute_61, code=lto_61
-g
-G
-gencode=arch=compute_61, code=sm_61
-dlto

So my question is essentially if there is some clash between compliing the source files with -G and then trying to use the -dlto flag? Especially now with later version (11.6)?

my suggestion: If you’re building a debug code, don’t use lto. If you are using lto, don’t use -G

1 Like

Thank you! I finally tried this and it seems it works.