Hello everyone!
I started learning ptx and I am having a blast! I manually coded in ptx a simple copy kernel, the handcrafted one and the one generated by nvcc, generates somewhat different sass, the handcrafted one actually seems to generate more instruction, although runtime timings are within margin of error. I did not expect to see any major difference, now, It would be super useful to see if I can disassemble and having the corresponding ptx interleaved might help me understand better what is going on there.
I found the following link:
http://docs.nvidia.com/nsight-visual-studio-edition/3.2/Content/PTX_SASS_Assembly_Debugging.htm
Which is great, although I am currently working on centos using clion/cmake project I can bring it to windows but I wonder if I can achieve the same using the given utilities like cuobjdump etc.
So far I am able to obtain, ptx with interleaved source code and sass code. Any idea?
M.