Compiler -Xptxas flag has no effect

Hi all,
I’m trying to determine register usage by adding the -Xptxas -v flag to the compiler options. However, I get no additional information in the compiler output. The flag is simply ignored.
Are there version variations to the usage of this flag? Or is there some other prerequisite for it to work correctly that’s not mentioned in the documentation?
I use CUDA 4.0 on a x86_64 Linux with the CUDA plugin in Eclipse.

Thanks for any suggestions.

Sasha

What is the full command line? You might only be generating PTX code and thus never invoke ptxas, so the option would simply get ignored.

Sounds like a possible reason… Here’s the full command line:
/usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/work/v/boost_1_41_0 -I"/home/sasha/workspace/V2/cuda" -I"/home/sasha/workspace/V2/lib" -I/usr/include -I/usr/lib/gcc/x86_64-redhat-linux/4.1.2/include -I/usr/include/c++/4.1.2/backward -I/usr/local/include -I/usr/include/c++/4.1.2/x86_64-redhat-linux -I/usr/include/c++/4.1.2 -O3 -g -c -Xcompiler -fmessage-length=0 -arch=compute_20 -Xptxas -v -o “cuda/cuda_main.o” “…/cuda/cuda_main.cu”

Thanks

Sasha

Indeed. [font=“Courier New”]-arch=compute_20[/font] generates PTX only. Add [font=“Courier New”]-code=compute_20,sm_20[/font] to it to also run ptxas and thus see the register use.

Thanks for the advice, Tera.
Addition of the -code flag worked right away. A quick question, though: the compiler printed out information only for several functions. The kernel is fairly complex, and I have well over three dozen functions. Is there a way to force it to produce info for all of them? Or perhaps indicate which ones I’m interested in?

Thanks again

Sasha

The functions that produce no register usage diagnostics probably get inlined. Add [font=“Courier New”]-Xopencc -noinline[/font] to the nvcc arguments to see some approximation to their register usage (and have their register usage removed from the calling function or kernel).

Allow function inlining for production compilation however, it will probably generate faster kernels.

The -noinline flag works great - all functions are listed individually.

Thanks for your help

Sasha