Can CUDA expose a file descriptor which I can poll to wait for a CUDA stream to reach some point in its execution? I see many related questions on these forums, but no answers, not even confirmation if it’s impossible. I would be happy to learn there’s something which doesn’t work with Windows or discrete GPUs, I only care about L4T running on an Orin Nano.
CUDA events seem like an obvious piece of a solution, but I don’t see any APIs to expose a file descriptor which I can wait on. cudaIpcGetEventHandle is close, but the documentation only talks about importing it in another process. Is there some way to wait on that file descriptor, even if it’s platform specific?
cudaImportExternalSemaphore seems promising, but there’s no support for a generic file descriptor. If I could import an eventfd or a pipe, that would be great.
I could do something manually with cudaLaunchHostFunc, but I’m hoping to avoid the overhead of additional thread context switches. Also I keep seeing references to limitations it has with adding dependencies between work on independent streams that I’d like to avoid.