Program compilied with CUDA 5.5 is slower than with 5.0 (about 10% degradation)

Did sombody meet similar issue? Can I avoid this by using some options?

Following is the command for compiling and linking (building shared library):
nvcc -Xcompiler -fPIC -m64 -gencode arch=compute_20,code=sm_20 …
gcc -fPIC -Wall -m64 -O3 … -Lxxx/cuda-5.0/lib64 -lcudart -lc -lm -shared

I am using c2070, driver version is: 319.37.


It would be helpful if you could file a bug report via the registered developer site, so the compiler team can have a look. Please attach self-contained repro code that demonstrates the performance regression. Thank you.

Hi Njuffa,
I am afraid I can’t upload the code since it is part of the product. I will try to repeat it at a testing routine and upload that one.

I know this an old thread, however I would like to know if there are any news about the issue.
I have the same problem with a very large source code that compiles in more than 20 minutes with CUDA 5.0 and
in around 10 with either CUDA 5.5 or CUDA 6.0, but in the latter case the performance is 20% worse.
Unfortunately I am afraid I got the same problem as dpig101 concerning the upload…
I have the feeling the compiler version 5.0 spends more time in the optimization phase.
Is there any way to force the more recent compilers to do the same?

Compiler defaults are to compile with maximum optimization (-O3). The shorter compilation time with newer compilers is probably unrelated. Because there have been reports of excessive compilation times, some improvements have been made in recent versions that are aimed at reducing compilation times.

Compilers include complex sequences of transformational phases, many of which are driven by tunable heuristics. Interactions with complex pieces of code are hard to predict and so regressions on some kernels invariably result when changes to the tool chain are made, with the overall distribution of speedups approximating something like a (possibly skewed) normal distribution.

I encourage programmers to file bugs for any significant regressions. To allow debugging, such bug reports must be accompanied by code that reproduces the issue. Data in bug reports is visible to the filer and NVIDIA engineers. You can try to simplify and/or obfuscate your code if necessary.