Adding cudadevrt makes slow


I am trying to something in cuda dynamic parallelism. But i just realised that when i compile program with -lcudadevrt, program works slower than normal even if i don’t use dynamic parallelism.

Is it make sense?
Thanks in advance

-rdc=true is also required for CDP, and this may be what is causing the slow-down.

Actually i am including -rdc=true as well while linking with -lcudadevrt.
But is it possible that can rdc cause slow-down? If yes, Why? I couldn’t find any document regarding this.


Some code optimizations may not be possible with separate compilation, as the necessary information is only known for certain at the link stage, not at the compilation stage as in whole-program compilation. This can sometimes be counteracted by incorporating final code optimizations into the link stage, but that is not something the CUDA tool chain currently does (at least not through CUDA 6.5).