Hi, I have a an application that I am trying to profile. Upon some profiling, I recognized register pressure as the issue, but wanted to look at the assembly code for a better understanding of what were the problematic lines causing register spillage. I compiled this on A100 GPUs, with CUDA 12.0. I add the following to my CMAKE_CUDA_FLAGS - -g --save-temps -lineinfo. There is however no information output about the source code lines in the ptx files. Instead there is a bunch of $L__info_string*. Below is a snippet of what I am getting. What could I be doing wrong?
You are doing nothing wrong. What you show is the line information.
Typically you would use a tool like cuobjdump or nvdisasm (CUDA Binary Utilities ) to get the annotated assembly from the compiled program or object file.
The code shown above is PTX. This is a compiler intermediate format and virtual ISA. PTX code uses virtual registers which are created in a SSA (single static assignment) fashion, that is, a new register is used for each new instruction output created.
Allocation of physical registers occurs as part of the PTX to SASS (machine code) translation as instruction selection and register allocation are GPU architecture dependent. This work is done by ptxas, which is an optimizing compiler. When looking at register pressure, what is relevant is therefore SASS (e.g. from cuobjdump --dump-sass). For examining the âfat partsâ of SASS in terms of register usage, you may want to take a look at using nvdisasm --print-life-ranges.
High register pressure != register spillage. High register usage may lead to register spillage, but often it does not. When you add -Xptxas -v to the nvcc command line, what do the resulting basic usage statistics look like?
Thanks, am I misremembering this, or was annotating assembly code always an extra step with cuda? I was assuming I could get an annotated version of the ptx code the same way I get for x86 upon compilation, e.g.
Thanks, I will try using both. I know there is register spillage because I already looked at the code a while back, but back then I was working on a different hardware with a different set of profiling tools. But you are right, the two shouldnât be used interchangeably.
It is not clear from the information provided how it was determined that register spilling occurs. Iâll note that the use of local memory in SASS by itself is not a reliable indication of register spilling.
nvcc has a -src-in-ptx switch. To get the desired output you must also use either -G or -lineinfo on the compilation command line, along with -src-in-ptx.