Hello fellow CUDAers,
As many of you know, for the past several years I’ve been working on modern-C++ wrappers for the CUDA APIs. I’ve been gradually catching up with NVIDIA’s work: The runtime, then the driver, then NVRTC, the PTX compiler - but I still have some gaps.
Perhaps the largest gap is support for CUDA execution graphs, which I’ve so far not provided. Since graphs are a somewhat-integrated set of APIs; and there are multiple approaches to how to represent graphs in general regardless of what NVIDIA offers, I was unsure how to proceed. (Especially since my intention is for the wrappers to be thin, not to offer elaborate alternative abstractions).
Well, over the past several months I’ve put together an initial implementation. Since I have not personally used graphs much in my own work (to be honest, I’m only planning to and this hasn’t yet materialized) - I am not sure I have sufficient intuition on what would feel natural and convenient to developers who do use them. So, I would very much appreciate feedback from those of you who have worked with CUDA graphs, or even - are considering working with them - about this implementation.
You can take a look at the two CUDA samples I’ve adapted:
simpleCudaGraphs- constructs and runs a simple CUDA graph with a few nodes, once explicitly and once via stream capture.
jacobiCudaGraph- Uses the Jacobi iterative methods - once regularly, once with an explicitly-constructed CUDA graph, and once with a stream-capture-generated CUDA graph.
And, of course, you can look at the wrapper code proper. It’s on a
graph_support branch, and the interesting files are:
cuda/api/graph/definitions of graph templates and graph instances, type-erased and type-embued nodes
cuda/api/multi_wrapper_impls/graph.hppimplementations of functions requiring the definitions of multiple API wrapper classes at once
cuda/api/stream.hppfor the stream capture code
cuda/api/device.hppfor the code regarding memory usage by graphs
Also, since I last posted here about the library, a few releases have been made, improving NVRTC support; adding support for the PTX compilation library; and fixing multiple bugs. The latest release is:
and you are welcome to enjoy it. The releases page has a detailed Changelog of how its progressed.
Thanks goes to the many people who have starred so far, and even greater thanks to people who have filed bugs or written me directly to discuss aspects of the wrapper library - that is always educational and useful.