Cuda Driver calls from Windows Kernel

How does one go about calling cuMemcpyAsync (DtoD) from a Windows Kernel Driver?

Since I am in Kernel space, there is no “cuda context” right?

Basic idea I that, as a part of an ISR, I would perform a cuda device-to-device async memcopy.

The alternative I was thinking is to use a DPC and then use the registered callback to perform the copy, but I wanted to avoid the DPC and just get the job done via the ISR; since it is an async call, it is a Fire-and-forget to the ISR.

Is there another NVIDIA / CUDA API or driver I should be using?

Note: Before the interrupts start coming in, user code will have ran and created context, allocated src/dest memory, etc. So the user-space environment is ready to go and I just want the ISR to initiate the async-copy.