Make a wish: What would you like in a CUDA on-device formatted printing library?

As some of you may or may not know, I experimented with an ostream-like component to my cuda-kat library for a while, but eventually turned it off because I felt it wasn’t robust enough, a little esoteric in terms of the API, and took a hell of a lot of time to compile.

I’m going to work on something else, less ambitious to begin with, that’s based on a stand-alone implementation of the printf() family of functions - including sprintf(). Now that I have that working nicely and passing a bunch of tests (and quick to compile against, since the C-style allows for using a device-side object instead of a bunch of template headers) - I need to think about what exactly to offer.

So - I ask you… :

  • What features would you like to see in a (debugging-oriented) library for formatted printing on the device side? You can mention specific functionality as well as ways to use it, customize it etc.
  • What interesting facilities have you implemented for yourself - in C’ish code, in C++'ish code, or even with macros - for your own debugging use - and believe might be useful for the general public? Example: I have used a macro wrapping printf() which also identifies the printing thread, or the thread’s block, using a prefix to the format string and some more arguments. Now I can implement something like this as a proper function, since vprintf() is available.


  • Bear in mind that cuda-kat is a 3-BSD licensed library.
  • Some people are not aware of this, but CUDA’s internal printf() implementation is actually lacking several important features of standard-C printf. I plan to offer a proper printf() - albeit less performant than the internal one, since it has to allocate a buffer, sprintf()to it, then printf("%s", the_buffer).
  • Links to ideas/code are welcome, but it’s best to describe in addition to just linking.