nvcc preprocessing

I could not determine from the documentation certain details of the nvcc preprocessing phase. My first question is: who does the C/C++ preprocesing when running nvcc ? Is it the nvcc driver itself or is it the host compiler ? Does the answer matter between when a .cu file or a .c/.cpp file is being preprocessed ?

the nvcc processing flow is covered in the nvcc manual.

Furthermore, you can inspect the exact sequence by passing --verbose to nvcc

“I could not determine from the documentation certain details of the nvcc preprocessing phase”. Obviously I read the manual or I would not ask the question. Please tell me where in the manual my question is answered as I must have missed it.

The portion of the nvcc manual that I was referring to is here:


To see the specific commands issued, use the --verbose option on nvcc. This will identify the specific tool as well as the command line options passed to that tool.

I think if you run a simple nvcc compilation command with --verbose, and compare it to the diagram, you will be able to line up tools used with specific points in the processing flow.

Probably, if you have host code in mind when you use the word preprocessing, the best answer might be cudafe++.

You can also use the --keep command in nvcc to study the transformations applied by each tool.

my comments here mostly apply to .cu file processing or -x cu switch. For a .cpp file, essentially nvcc hands the entire file off intact to the host compiler. To be more specific, as far as I know there are no NVIDIA tools involved in the preprocessing or compilation; but there are NVIDIA tools involved in linking, and binary packaging. Again, you can use --verbose to confirm behavior specifics.

(edited: njuffa gave a better description and reference below)

The C/C++ preprocessor must be handled by either an NVidia tool or a host compiler. I can hardly believe that both are handling the C/C++ preprocessing code. My question remains: who is handling the C/C++ preprocessing code ? I am fully aware of the portion of the doc in your link.

Secondly if an NVidia tool is handling the C/C++ preprocessor code, what C/C++ standard level is being used to handle the preprocessor code ? I am sure you must be aware that there are different levels of C/C++ preprocessor conformance. For instance Visual C++ is notorious for their non-standard C/C++ preprocessor, but even among those compilers which attempt to adhere to a C/C++ preprocessor level of conformance there are some that support variadic macros aand some that do not based on the level of C/C++ standard conformance they are implementing. If an NVidia tool is handling the C/C++ preprocessor, rather than a host compiler, where is the documentation explaining its level of C/C++ preprocessor conformance and what predefined macros it supports ?

NVIDIA claims compliance with various language standards, subject to various limitations/exceptions:


I’m not aware of further documentation on the subject pertaining to your questions.

Are you saying that nvcc always does the C/C++ preprocessing rather than the host compiler, but since it knows the host compiler it attempts to emulate the host compiler’s conformance to the C/C++ preprocessor ?

That emulation seems like much work rather than let the host compiler handle the C/C++ preprocessor. On Windows Visual C++ is the host compiler and emulating the non-standard C/C++ preprocessor of Visual C++ is quite a task, and certainly has very little to do with the C/C++ preprocessor standard at any C++ level.

Does this emaulation of the host compiler’s C/C++ preprocessing change at all between a .cu file, with device and host code, and a normal C/C++ file ?

A useful one-page diagram of the compiler “trajectory” (not sure why its not called “flow”) can be found here:


From this presentation you can also see that the CUDA toolchain uses EDG (from Edison Design Group) for the frontend, followed by NVVM (derived from LLVM) for the translation of device code to PTX, followed by PTXAS (a proprietary NVIDIA compiler) to compile PTX into SASS (machine code).

CUDA allows host and device code mixed in the same source file, which means the code needs to be parsed completely for the purpose of splitting it into host and device portions. It seems reasonable to assume that correct parsing must include C++ preprocessing, and based on the documented trajectory, that would be done by EDG, not the host compiler. The specific executable is cudafe, e.g.:

cudafe: NVIDIA (R) Cuda Language Front End
Portions Copyright (c) 2005-2015 NVIDIA Corporation
Portions Copyright (c) 1988-2014 Edison Design Group Inc.
Based on Edison Design Group C/C++ Front End, version 4.10 (Jan  9 2017 17:32:40)
Cuda compilation tools, release 8.0, V8.0.60

The host compiler only gets to see the extracted host code. By observation of the intermediate files, the extracted host code is not always an exact copy of the original host portion of the CUDA code: it frequently seems to be slightly re-stated in a functionally equivalent way.

What are the “certain details of the nvcc preprocessing phase” that you need to determine? For what purpose do you need to determine these details? I sense an XY problem of some sort …

I am the maintainer of the Boost preprocessing library and the author of the Boost VMD library. Both libraries rely heavily on their advanced preprocessing APIs on the knowledge of C++ standard preprocessing compliance which various compilers have or do not have.

The EDG compiler is highly compliant, as is gcc and clang, but the Visual C++ preprocessor has always been C/C++ standards non-compliant in numerous small ways which affect advanced preprocessing code. Therefore it is important to know when using nvcc what the level of C/C++ standard compliance actually is. Furthermore a recent bug report in Boost is showing that when nvcc processes C/C++ files its level of C/C++ preprocessor compliance is different from when it processes CUDA files. In the former case evidently the support for variadic macros in the Boost preprocessing library and the Boost VMD library works fine but when processing CUDA files it does not. Originally, when processing CUDA files, support for variadic macros was turned off in the Boost libraries, but a bug report insisted that it did work so I turned it on. But now it looks like I was wrong to do so. I am just trying to understand how indeed the preprocessing in nvcc does work. The easiest thing is probably just to turn back off the support for variadic macros when processing CUDA files with nvcc.

Are you refering to host code or device code? It is not clear from your description. I am not familiar with the libraries you mention.

Host code from CUDA files eventually gets passed to the host compiler, and any restrictions the host compiler may impose apply at that point. You can look at the intermediate files produced by the CUDA tool chain to see what exactly gets passed to the host compiler. Historically, MSVC has often imposed more restrictions on host code than other host tool chains supported by CUDA. I know the pain.

As far as error messages during CUDA builds are concerned, you should be able to tell whether they are reported by EDG or by MSVC, as the latter uses a very characteristic style of error messages that is easy to recognize. Verbose builds can also help pinpoint at which stage the error is thrown.