I am using CUDA and C++ for image processing and need heavy meta programming to optimize
some kernels.
My problem is that nvcc is taking to much time and memory (~5 minutes and 1Go of memory).
Since I often need to recompile the project, I loose a lot of time waiting for nvcc External Image.
Plus, the detected errors are sometimes reported after several minutes of compilation External Image
gcc and cl provide a way to precompile headers to speed up compilation. My question: Does nvcc can precompile C++ headers? I did not find anything
about it in the documentation, neither on the web.
nvcc is only a wrapper that emits commands to other programs. It uses gcc to preprocess it’s files (at least on Linux and on the Mac, don’t know about Windows). So even though nvcc does not seem to know about preprocessed headers, you could watch the commands nvcc is using internally by running nvcc -dryrun -keep, and then script them yourself using precompiled headers.
It is also the case on windows. Running nvcc with option --verbose,
I can see that one step is taking ~90% of the compilation time: cudafe.
Unfortunately, the documentation doesn’t explain how to make cudafe
precompile headers. Anyway, I am not sure that header precompilling
will help.
Template instantiation (I think) is the more time consuming task of the
compilation, and I’m afraid that I cannot do anything about that <img src=‘The Official NVIDIA Forums | NVIDIA<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />
Does anybody has been bothered with very very long compile time when
using deep template instantiation? Is there any hack that can make it
faster (other than hand made pre-instantiation External Image ) ?
Oops, I thought from your previous comment that you are on Windows, but re-reading it I see that’s not the case. Forget about --no-cpp-cudafe then, as under Linux gcc is already doing the preprocessing anyway.
I guess it would be up to Nvidia engineers then to check why cudafe is so slow. Don’t see much that end-users could do about it (other than trying to minimize preprocessor use).
Looking through the documentation ([url]CUDA Toolkit Documentation), I cannot find anything that suggests support for precompiled header files.
I recommend filing feature requests as “requests for enhancement (RFE)” through the bug reporting form that is linked from the registered developer website. Adding a prefix "RFE: " to the synopsis (subject line) when filing would be helpful.