How to correlate SASS assembly instructions with corresponding source code on linux machine?

I know we could do this on windows machine using Nsight visual studio. How do we correlate source code to corresponding assembly instructions on linux machine?

Thanks for your help.

Be advised that if you compile (as normal) with full optimizations, that the correspondence between SASS and source can be a bit jumbled. One possible method:

compile your code to cubin, selecting lineinfo and an appropriate arch specification greater than or equal to sm_30:

nvcc -arch=sm_35 -lineinfo -cubin

then use the nvdisasm utility:

nvdisasm --print-line-info test.cubin

unfortunately this still only creates output that looks like this:

/*12f8*/                   ST.E [R16+0x7f8], R0;
        //## File "/home/user2/misc/", line 19
        /*1308*/                   JCAL `(free);
        /*1310*/                   MOV R5, R18;
        //## File "/home/user2/misc/", line 20
        /*1318*/                   MOV R4, R2;
        /*1320*/                   JCAL `(free);
        /*1328*/                   BRA `(.L_2);
        /*1330*/                   MOV R5, R17;
        //## File "/home/user2/misc/", line 13
        /*1338*/                   MOV R4, R16;
        /*1348*/                   JCAL `(free);
        /*1350*/                   MOV RZ, RZ;
        //## File "/home/user2/misc/", line 21
        /*1358*/                   EXIT;
        /*1360*/                   BRA `(.L_3);

So you’ll need to look up the source lines yourself.

It’s also possible to get a nicer view in the visual profiler, which is available both on windows and linux:

Depending on your code, the SASS may be more than “a bit jumbled” for optimized builds.

Some source code lines may have no equivalent, as they were optimized out due to, for example, common sub-expression elimination or strength reduction.

The machine instructions for a given source line may be sprinkled across a block of dozens of instructions for the highest performance scheduling. Usually, the larger the basic blocks (straightline pieces of code between branches) the more jumbled the machine code gets realative to the source code.

Also, standard math library functions will almost always be inlined into your code for performance, and you will find those inlined sequences intersperses with your own code.

Overall, a detailed source-based backannotation of a largish sequence of machine instructions can take several hours. I have done that work many times when tracking down compiler bugs, so it is doable but requires patience. An additional compication is that NVIDIA does not provide a detailed description of the SASS instruction set.

Things are more “orderly” for debug builds, for which basically all optimizations are turned of, i.e. the structure of the machine code matches the structure of the source code pretty well.