It looks to me an event object with no cuEventRecord() history since cuEventCreate() never blocks the stream that wants to synchronize concurrent kernels.
Do we have a way to initialize an event object with state of “not ready”?
I want to use cuStreamWaitEvent() to synchronize the completion of up to N kernels, prior to execution of the finalizer kernel. However, at the point when we enqueue the finalizer kernel, we cannot ensure these event objects are recorded by the invocation chain by prior kernels.
The current workaround is, host code ensures cuStreamWaitEvent() after all enqueue of the prior kernels.