nvcc optimization flags

Dear all,
I would like to be aware of all the optimization flags of nvcc compiler. Is there any clear list or document of all the optimization options with flags of nvcc I have and the description of them?

Thank you in advance!

they are documented in the nvcc manual:


You can also get command line help with:

nvcc --help

thank you very much for your reply!

I read the document you proposed me and I found that the flags that automatically optimize the CUDA C code are the -O options that like in gcc optimize the host code (correct me if this is incorrect).

I would like to ask:
a)Do I have the ability with nvcc compiler to modify specific oprimization flags (-faggressive-loop-optimizations, -falign-functions, -falign-jumps, -falign-labels, -falign-loops, …) like in gcc?
b)Are there optimization flags that can optimize the GPU kernels (the device code)?

Thank you in advance!

The only thing officially supported and documented is what is listed in the manual link I pointed out.

The -O flag:

  1. gets passed to the host compiler for its use
  2. may also impact what is used on the ptxas compilation command line. ptxas is the primary tool that generates optimized device code.

a) No, not that I am aware of
b) The nvcc flags do affect the optimization of GPU kernels (device code) to the extent that they impact what gets passed to ptxas. For example, the -G option will disable optimizations in ptxas

You can use the --verbose flag on nvcc to see experimentally the affect of adding various -O optimizations and/or -G on the nvcc command line, as it pertains to ptxas, to learn about how it impacts ptxas operation.

I wouldn’t be able to answer detailed questions like what do the various -O1 -O2 -O3 optimization levels affect in ptxas, as that is not documented anywhere that I know of, and is probably subject to change from one CUDA version to the next.

As far as I recall, -On affects only host code. To set the PTXAS optimization level, one would need to use -Xptxas -On; the default is -Xptxas -O3.

The front portion of the CUDA compiler (where architecture-independent optimizations happen) is based on LLVM, and I think (not sure) the open-source LLVM distribution comes with a PTX code generator, so if you want to experiment with specific optimization strategies in the context of CUDA, that may be a way to experiment with the details of various optimizations.

Note that in the CUDA toolchain, PTX code is compiled down to machine language by PTXAS, which despite its name is an optimizing compiler. This means that PTX serves a dual role as a virtual architecture and a compiler intermediate format.

Hi all,

I am exploring NVIDIA TX1 for optimization related stuff( just started). Can you please provide some ways to do the same. What I want to do is:
We optimize normal c/c++ codes using llvm via compiler flags like O1, O2 … Ofast. Or maybe by writing our own optimization/analysis pass. Can we do the same with cuda code? If yes then how.

Pravin Srivastav

TX1 uses nvcc also. All you have to do is read this thread. For TX1 specific questions you may wish to post those on the TX1 forum.