Hi all,
I’m trying to determine register usage by adding the -Xptxas -v flag to the compiler options. However, I get no additional information in the compiler output. The flag is simply ignored.
Are there version variations to the usage of this flag? Or is there some other prerequisite for it to work correctly that’s not mentioned in the documentation?
I use CUDA 4.0 on a x86_64 Linux with the CUDA plugin in Eclipse.
Indeed. [font=“Courier New”]-arch=compute_20[/font] generates PTX only. Add [font=“Courier New”]-code=compute_20,sm_20[/font] to it to also run ptxas and thus see the register use.
Thanks for the advice, Tera.
Addition of the -code flag worked right away. A quick question, though: the compiler printed out information only for several functions. The kernel is fairly complex, and I have well over three dozen functions. Is there a way to force it to produce info for all of them? Or perhaps indicate which ones I’m interested in?
The functions that produce no register usage diagnostics probably get inlined. Add [font=“Courier New”]-Xopencc -noinline[/font] to the nvcc arguments to see some approximation to their register usage (and have their register usage removed from the calling function or kernel).
Allow function inlining for production compilation however, it will probably generate faster kernels.