General-purpose static and/or dynamic CUDA scheduler?

Suppose I have a bunch of kernels I need to schedule, a bunch of host-side buffers some of them depend on, and inter-kernel dependencies (e.g. K2 can only run after K1).

Is there a library which takes information about these buffers and kernels - nothing that’s application-domain-specific, just enough to be able to schedule launches and memory copy operations - and schedules the work as best it can? Say, on a single GPU?

If the answer is “yes” - same question, but when some of the scheduling decisions can’t be made until we’re in mid-flight, e.g. we can’t know in advance which is the best order of copying buffers to the device, or we need some kernel’s results to decide which kernels to schedule, and with which parameters, later on, etc.

If the answer is “no”, then - is everybody using application-integrated schedulers? And what’s the closest thing to such a self-contained scheduler you can think?

Notes:

  • Scheduling can involve using events, callbacks, additional host threads - whatever works.
  • Of course you can schedule everything on a single stream, or on a single I/O stream and a single compute stream with events controlling when kernels are launched - but that is quite sub-optimal.