Any performance penalty for unused printfs other than consuming icache?


As an alternative to exceptions, which aren’t really supported in CUDA AFAIK, I’ve taken to putting in printf’s. Is there any real performance penalty to having printfs that aren’t executed (because they’re guarded by if tests that are intended to never return true) dabbled throughout my CUDA code?

I’m guessing the printfs could muck with the icache, but I’d be a bit surprised if there was a larger impact than that.


PS – I could implement passing around a set of flags, and checking those flags after the kernel finishes. But, I’m trying to avoid maintaining a rather large set of error codes at this point (in fact, I’ve had such a set in the past, but it’s more work than just putting in a printf).

in kernel printf is a mechanism that is not documented to any high level of detail. This means that since the implementation can change, any suggestions may change or become invalid as well.

in my experience, in-kernel printf is always implemented as a call. It is never inlined. Therefore I wouldn’t expect any icache impact for the call-not-taken case. However it is going to have implications for register and stack usage, just as most actual function calls would.

insertion of a printf could also affect compiler optimization behavior, because there is data that it may need for the printf which it may not need for any other purpose in your code.

I’m sure you’re looking for a “don’t worry” kind of answer. But I doubt that can be stated categorically. I would suggest running test cases to judge impact in your specific case.

In addition to what Robert Crovella said, if-statements and calls that cannot be optimized away potentially interfere with other compiler optimizations, such as instruction scheduling. Since ABI-conformant calls require data in specific registers for argument passing, calls may also add constraints to register allocation in the immediate vicinity of calls, which may hurt performance indirectly through increased register pressure.

Note that compiler effects due to conditional printf generally apply to all computing platforms, they are not specific to CUDA. A call to printf() is a black box to a compiler so it has to assume the call has side effects.

I would expect the effects of adding conditional printf() calls to be minor in many non-trivial use cases, but there are no guarantees that it cannot be otherwise. If you have specific performance needs, it would make sense to put the printfs inside macros that can be disabled at compile time, so you can easily gauge the performance impact.

Ah ha, these are great points – thanks! I’ve already got a DEBUG macro flag that I use, so I’ll just put my related if tests and printfs inside that. Good to know that it may actually be worth the trouble of putting them in those #if/ifdefs. Thanks!