Performance issue with object linking.


I have a GPU ray tracer.

It runs slower when it’s complied with -rdc on.

Why is that?
Arent they supposed to be the same?

It is not clear what “they” refers to, and why “they” are supposed to be the same. I assume you are refering to whole program compilation on one hand and separate compilation with device code linking on the other hand. Not knowing anything specifically about your code base, generally speaking the following applies.

There are differences between whole program compilation and separate compilation because in one case the compiler can see the entire code and optimize accordingly (e.g. inline extensively, which then opens up opportunities for further optimizations). With separate compilation less information is available when compiling each compilation unit and there are additional restrictions, e.g. each non-static functions must be callable using ABI conventions (as the default linkage in C / C++ is external linkage). Some amount of slowdown is therefore not unexpected with separate compilation, how much difference are you seeing in your case? In essence, there is some tradeoff between the flexibility of separate compilation and the performance of whole program compilation.

To find out how the code is different in the two compilation modes, you can dump the disassembly for a binary with cuobjdump -dump-sass.