So, anybody who’s ever used printf() to debug GPU kernels must know these frustration:
If you print something, then print again, the lines won’t appear together since other threads’ printf()'s will likely come in-between;
Which means that you must combine all of your printing into a single instruction;
But you can’t do that for a variable-size structure;
… and you pine for having a sprintf() (or a C+±style stringstream).
And that’s not all: What if you want to write a printf wrapper, which, say, identifiers the current thread? You can write a varargs function in CUDA… but unfortunately, there is no vprintf which you can call inside your wrapper. So, you’re stuck with writing a macro. Blech :-(
Finally, maybe you want to flex your printf muscles: printf("%.*s\n", my_string) for example. or printf("%z\n", my_size); . Tough cookies, that’s not supported. Not to mention extra features outside of ISO C, like the super-useful support printing in binary.
It’s weird that CUDA has been around for, what, 13 years now, and nobody’s offered this (AFAICT). So, that period is now - almost - over. I’ve recently pushed an implementation of most of the printf() family of functions to the development branch of my cuda-kat library.
In a way, this is pretty mature code: It’s a porting of this stand-alone printf library for embedded systems, so it’s inherited a rather extensive set of unit tests. But even though these now pass when running in GPU kernels - that doesn’t test the behavior in a massively parallel environment.
So: I need some beta testers to try this out. So if you’re doing some kernel development work, and occasionally debug-print stuff… please consider giving it a spin.
Bugs/suggestions can obviously be filed either on the cuda-kat issue page or here.
It’s an independently-developed library. I really think something like this should have been part of CUDA itself, and I would definitely be open to collaborating with NVIDIA on beefing up the internal printf with the features of this library. Unfortunately, as I have experienced with my CUDA C++ API wrappers - NVIDIA is not too keen on such collaborations.
But - who knows? Maybe they might change their mind somehow. Hope springs eternal etc.
For now, I depend on satisfied users spreading the word about these libraries.