Synchronizing CUDA streams with file descriptor-based polling

Can CUDA expose a file descriptor which I can poll to wait for a CUDA stream to reach some point in its execution? I see many related questions on these forums, but no answers, not even confirmation if it’s impossible. I would be happy to learn there’s something which doesn’t work with Windows or discrete GPUs, I only care about L4T running on an Orin Nano.

CUDA events seem like an obvious piece of a solution, but I don’t see any APIs to expose a file descriptor which I can wait on. cudaIpcGetEventHandle is close, but the documentation only talks about importing it in another process. Is there some way to wait on that file descriptor, even if it’s platform specific?

cudaImportExternalSemaphore seems promising, but there’s no support for a generic file descriptor. If I could import an eventfd or a pipe, that would be great.

I could do something manually with cudaLaunchHostFunc, but I’m hoping to avoid the overhead of additional thread context switches. Also I keep seeing references to limitations it has with adding dependencies between work on independent streams that I’d like to avoid.


Could you share more about your use case?
Do you need this for the IPC use case?


I’m specifically looking to integrate CUDA streams with tokio, which provides the AsyncFd interface to trigger actions when a file descriptor is readable and/or writable. To use this, I need a file descriptor that will be readable and/or writable when the CUDA stream completes a given operation.

@AastaLLL any updates on this?


Sorry for the late update.

CUDA library doesn’t support such a mechanism.
But you can check if our NvSCI which is designed for IPC can meet your requirements.

Prepare an NvSciIpc Endpoint for read/write


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.