Announcing a "CUDA Kernel Author's Toolkit" library

I would like to announce an initial release of a new device-side C++11 library I’ve written -

The CUDA Kernel Author’s Toolkit

It’s anheader-only library which is a loosely-coupled collection of useful functions and classes for writing device-side CUDA code (kernels and non-kernel functions). It’s the result of having repeatedly found myself rewriting the same small bit of code, or copy-pasting files or snippets from one project to another - which did not have to do with the project specifically nor even with the application domain. So, I sat down to round them out into something more respectable and robust which others could also use.

The facilities in this library…

  • Make our device-side code less cryptic and idiosyncratic, with clearer naming and semantics.
  • Not repeat ourselves as much - the DRY principle.
  • Write templated device-side without constantly coming up against not-trivially-templatable bits in CUDA.
  • Use standard-library(-like) containers in device-side code (but not have to use them).
  • Use less magic numbers.

… while not committing to any particular framework, paradigm or class hierarchy.


  • CUDA 8.0 or later.
  • Compilation with --std=c++11 or later standard. (Caveat: Not tested with LLVM’s CUDA support.)
  • A Linux, Mac or Windows operating system (i.e. if CUDA is usable, then so should this library be).
  • You can just copy the headers as-is, but if you want a “proper installation” then you’ll need CMake 3.8.2 and your OS’ build tool.
  • Optional: A recent version of the strf library for on-device streams.


I am an individual independent developer (well, in this context), so I rely to some extent on your - the community’s - support. I’d gotten some, and including quite a bit of useful feedback after announcing my cuda-api-wrappers library here a few years back, so I encourage you to comment/ask here, open issues, write me directly, and try it out.

Really interesting piece of work