How does one go about calling cuMemcpyAsync (DtoD) from a Windows Kernel Driver?
Since I am in Kernel space, there is no “cuda context” right?
Basic idea I that, as a part of an ISR, I would perform a cuda device-to-device async memcopy.
The alternative I was thinking is to use a DPC and then use the registered callback to perform the copy, but I wanted to avoid the DPC and just get the job done via the ISR; since it is an async call, it is a Fire-and-forget to the ISR.
Is there another NVIDIA / CUDA API or driver I should be using?
Note: Before the interrupts start coming in, user code will have ran and created context, allocated src/dest memory, etc. So the user-space environment is ready to go and I just want the ISR to initiate the async-copy.