How to debug a code working with -G but not without it?

Hello, I have a recursive __device__ function which works when compiled in debug mode (-G) but not in the default nvcc settings:

template <class RetType, class Element, std::size_t N, class Functor, class... Is>
KOKKOS_FUNCTION void annotated_for_each_serial(
        std::array<Element, N> const& begin,
        std::array<Element, N> const& end,
        Functor const& f,
        Is const&... is) noexcept
{
    static constexpr std::size_t I = sizeof...(Is);
    if constexpr (I == N) {
        f(RetType(is...));
    } else {
        for (Element ii = begin[I]; ii < end[I]; ++ii) {
            annotated_for_each_serial<RetType>(begin, end, f, is..., ii);
        }
    }
}

With Elementbeing std::size_t. It goes only once through the for loop without the -G (it should not). A workaround is static_cast<int>(begin[I]), which makes no sense. This is very deep in my code so I do not expect anyone to find the precise problem, but I would like to know what should I try to do in this situation ? Is it possible to disable particular ptxas optimizations to identify the one which is problematic ?

compute-sanitizer or cuda-gdb do not reveal anything (0 error).

Thx

I’m not aware of any way to disable “particular” ptxas optimization, although you can adjust the ptxas optimization level (its documented how to do so in the nvcc manual).

I guess if you’re convinced the issue is in the for-loop, then if I were working on it, I would start to instrument that either with carefully designed printf statements, or else via writing relevant variables (e.g. loop control variables) at each iteration into a global array for later inspection from the host.

If your code is compiled with -rdc=true or equivalent, then the function in question should have an identifiable entry point in the SASS. You could also go directly to SASS inspection.

Another approach is to continue to simplify your code until the problem disappears. If you get to a trivially simple realization (e.g. 100 lines or less, self contained) then you could start to inspect the SASS or ask others for help. If you get to a point where reduction in complexity causes the problem to go away, then that may either provide clues or be the point at which you start work on a MRE.

Thanks for the reply, I realize the problem is present with -Xptxas -O0which suggests this is not due to an optimization but to something else which is avoided by the -G flag. Do we have the detail of what the -Gflag does ?

(I already went through commenting chunk of codes or printf but it did not reveal useful)

No, we don’t.

It is not uncommon for programmers to misdiagnose where an issue originates. Therefore consideration should be given to the hypothesis that the primary issue is not the loop per se, but that the issue originates elsewhere.

You are using a revision control system and unit tests, correct? If so, one way of zeroing in on the issue is to find which code change first introduced the undesired behavior. This technique works particularly well if code changes are committed in small increments (a few dozens of lines rather than several hundreds of lines), a practice I recommend. If you don’t use a “smoke” test on every code commit, I would highly recommend adopting this practice. A smoke test usually comprises a meaningful subset of every unit test, selected so it can run “quickly”.

Another way of zeroing in on the root cause is to simplify your code in steps, with the goal of producing the smallest program that still reproduces the issue. This may take several days, depending on the complexity of the program. A minor additional simplification beyond this stage then makes the problem disappear, providing a bracket that usually provides a pretty good idea what triggers an issue. In many cases, the ultimate culprit is then found to be an incorrect mental model of the code, for example cause by a misunderstanding of specific details of programming language semantics.

Instrumenting code with printf()can make an issue disappear just like lowering the compiler’s optimization level (i.e. we have a “Heisenbug”), but usually they modify the generated code in different ways, so the issue may still be observable with one approach but disappear with the other. I have been able to successfully debug very hard to find problems which others had failed to resolved in several man days, just by judicious use of logged information from printf() calls in the code.