How may I output source code information in the assembly output

Hi, I have a an application that I am trying to profile. Upon some profiling, I recognized register pressure as the issue, but wanted to look at the assembly code for a better understanding of what were the problematic lines causing register spillage. I compiled this on A100 GPUs, with CUDA 12.0. I add the following to my CMAKE_CUDA_FLAGS - -g --save-temps -lineinfo. There is however no information output about the source code lines in the ptx files. Instead there is a bunch of $L__info_string*. Below is a snippet of what I am getting. What could I be doing wrong?

$L__BB1_5:
.loc 4 336 9, function_name $L__info_string5, inlined_at 4 353 25
cvt.u32.u64 %r50, %rd9;
shr.u64 %rd26, %rd143, %r50;
.loc 4 354 21, function_name $L__info_string4, inlined_at 4 363 20
.loc 4 346 9, function_name $L__info_string8, inlined_at 4 354 21
mul.lo.s64 %rd59, %rd26, %rd10;
sub.s64 %rd27, %rd23, %rd59;
.loc 3 1865 9, function_name $L__info_string2, inlined_at 2 759 203
cvt.u32.u64 %r51, %rd22;
.loc 3 1865 52, function_name $L__info_string2, inlined_at 2 759 203
add.s32 %r17, %r5, %r51;
setp.ge.s32 %p4, %r42, %r43;
.loc 2 759 219, function_name $L__info_string11, inlined_at 1 16 49
.loc 2 708 9, function_name $L__info_string12, inlined_at 2 759 219
.loc 5 304 5, function_name $L__info_string13, inlined_at 2 708 9
@%p4 bra $L__BB1_13;

You are doing nothing wrong. What you show is the line information.

Typically you would use a tool like cuobjdump or nvdisasm (CUDA Binary Utilities ) to get the annotated assembly from the compiled program or object file.

1 Like

The code shown above is PTX. This is a compiler intermediate format and virtual ISA. PTX code uses virtual registers which are created in a SSA (single static assignment) fashion, that is, a new register is used for each new instruction output created.

Allocation of physical registers occurs as part of the PTX to SASS (machine code) translation as instruction selection and register allocation are GPU architecture dependent. This work is done by ptxas, which is an optimizing compiler. When looking at register pressure, what is relevant is therefore SASS (e.g. from cuobjdump --dump-sass). For examining the “fat parts” of SASS in terms of register usage, you may want to take a look at using nvdisasm --print-life-ranges.

High register pressure != register spillage. High register usage may lead to register spillage, but often it does not. When you add -Xptxas -v to the nvcc command line, what do the resulting basic usage statistics look like?

Thanks, am I misremembering this, or was annotating assembly code always an extra step with cuda? I was assuming I could get an annotated version of the ptx code the same way I get for x86 upon compilation, e.g.

.Ltmp13:
%bb.10:
#DEBUG_VALUE: init:this ← $rbx
#DEBUG_VALUE: init:reactor_type ← [DW_OP_LLVM_entry_value 1] $esi
.loc 15 23 29 is_stmt 1 Submodules/PelePhysics/Source/Reactions/ReactorCvode.cpp:23:29
leaq 476(%rbx), %rdx
.Ltmp14:
leaq 64(%rsp), %rdi
.loc 15 23 6 is_stmt 0 Submodules/PelePhysics/Source/Reactions/ReactorCvode.cpp:23:6
movl $.L.str.7, %esi
xorl %ecx, %ecx
callq _ZNK5amrex9ParmParse5queryEPKcRii

In addition to the above advice, if you’re using Nsight Compute, you are able to relate the source with either PTX or SASS, example here.

2 Likes

Thanks, I will try using both. I know there is register spillage because I already looked at the code a while back, but back then I was working on a different hardware with a different set of profiling tools. But you are right, the two shouldn’t be used interchangeably.

It is not clear from the information provided how it was determined that register spilling occurs. I’ll note that the use of local memory in SASS by itself is not a reliable indication of register spilling.

1 Like

nvcc has a -src-in-ptx switch. To get the desired output you must also use either -G or -lineinfo on the compilation command line, along with -src-in-ptx.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.