Broken backtraces

I’m attempting to profile a large mixed-language application that links to many library packages. The primary source code for the application and many of the library packages are compiled with nvfortran, nvcc, and nvc++ (all products from nvhpc@22.7), but some packages are required to be compiled with gcc@9.4.0.

The issue that I’m seeing is that the top-down view of nsys-ui@2022.2.1 is reporting 94% of the runtime is being recorded in “broken backtraces”. Are there debugging flags that I can pass the respective compilers that would produce better reporting?


There is a bunch of information on troubleshooting backtraces at User Guide :: Nsight Systems Documentation (this is a direct link to the symbol troubleshooting, this interface doesn’t seem to want me to edit the link text).

Are you running CLI or GUI and what architecture are you on? I’m wondering which backtrace method is being invoked.

Thanks for the link. This is being collected by CLI on an MPI-parallel job on a Cascade Lake + V100 Linux cluster. The code is mostly OpenACC accelerated Fortran.

What is your command line?

mpirun -np 2 --map-by socket --bind-to core nsys profile -y 40 --trace=openacc,cuda,mpi -b fp -o report_221011_1148_%p myexe

The code is being compiled as -O3 -fast -gopt -Melf. I played around with some of the things in the user manual, but the command line flags appear to be for GCC, not NVHPC, and ResolveSymbols didn’t change anything.

Okay, “-b fp” means that you are using frame pointers for your back trace. I am wondering if most of your libraries were not compiled with frame pointers.

Can you try switching to “-b dwarf” or “-b lbr” (dwarf unwind or Intel Last Branch Registers). LBR is the fastest, but limited depth (hardware counters)?

Thanks for the suggestions. -b lbr disables the Top-Down View, so not much help there. -b dwarf creates small traces (~2MB rather than 14MB for frame pointers), and most of the symbols are unresolved. ResolveSymbols -s myexe trace_file.nsys-rep wasn’t able to resolve any of them. Just for kicks, I tried running -b fp on a debug version: no improvement.

@rknight , Bob, do you have any other suggestions here?

You might try compiling the binaries/app with the -fno-omit-frame-pointer switch (or equivalent) and then try -b fp again.

I just recompiled and profiled with -Mframe added to the compiler, with no change. Is that the correct nvhpc equivalent to -fno-omit-frame-pointer? Would -pg help?

I’ll note that the application that I’m profiling does have a very deep call chain: maybe that is causing problems. Also, the fragments listed under “Broken backtraces” are firmly within the application, not on the boundaries to calls to 3rd party libraries.

I’m checking on the switches.

Today, nsys CPU IP sampling is hardcoded to limit the backtrace stack size to 8k. Do you think your stacks are deeper than that?

I’m not really sure what an 8k size includes, but I’m running 30+ stack frames deep.

Thanks for your help!

I’m having a hard time tracking down the answer to your compiler switch question. Can you point me at the documentation for the Mframe and pg switches?

As far as I’m aware, you have to interrogate the compilers with the -help flag:

% nvfortran -help -Mframe
-M[no]frame         Generate code to set up a stack frame

I did notice that -fast implies -Mnoframe, so I built -O2 -gopt -Mframe -acc, but still no luck with nsys-ui -b fp Top-Down View traces.

Would it be possible to get access to your workload or a simple reproducer workload that we could use to debug nsys?

Sorry, can’t do that in general, but we have some Nvidia folks onsite: maybe we can work together on it? @Scot_Halverson

Hey Paul, I would be happy to help on a reproducer. Alternatively, if you can share the code in question with me, I can work with @rknight to understand what’s going wrong on your systems.