An sprintf() which works in your kernel? It's almost here, help beta-test it

So, anybody who’s ever used printf() to debug GPU kernels must know these frustration:

  • If you print something, then print again, the lines won’t appear together since other threads’ printf()'s will likely come in-between;
  • Which means that you must combine all of your printing into a single instruction;
  • But you can’t do that for a variable-size structure;
  • … and you pine for having a sprintf() (or a C+±style stringstream).

And that’s not all: What if you want to write a printf wrapper, which, say, identifiers the current thread? You can write a varargs function in CUDA… but unfortunately, there is no vprintf which you can call inside your wrapper. So, you’re stuck with writing a macro. Blech :-(

Finally, maybe you want to flex your printf muscles: printf("%.*s\n", my_string) for example. or printf("%z\n", my_size); . Tough cookies, that’s not supported. Not to mention extra features outside of ISO C, like the super-useful support printing in binary.

It’s weird that CUDA has been around for, what, 13 years now, and nobody’s offered this (AFAICT). So, that period is now - almost - over. I’ve recently pushed an implementation of most of the printf() family of functions to the development branch of my cuda-kat library.

In a way, this is pretty mature code: It’s a porting of this stand-alone printf library for embedded systems, so it’s inherited a rather extensive set of unit tests. But even though these now pass when running in GPU kernels - that doesn’t test the behavior in a massively parallel environment.

So: I need some beta testers to try this out. So if you’re doing some kernel development work, and occasionally debug-print stuff… please consider giving it a spin.

Bugs/suggestions can obviously be filed either on the cuda-kat issue page or here.