I am trying to use cuEventQuery from the asynchronous interface of the driver API and I do not understand what the behavior is supposed to be.
As a matter of fact, depending on the documentation, behavior seems to be different.
For example here you can see the online documentation for 3.1:
Here for 3.2 the pdf:
In the first case, CUDA_SUCCESS is returned if the GPU code has finished executing the stream till the point where cuEventRecord is called.
In the second case, CUDA_SUCCESS may also be returned if the cuEventRecord has not been encountered yet.
It does look like the second behavior is what I get in CUDA 3.1
My question: what is the behavior that should be dimmed “normal”? 1/ or 2/? If so, is there a reason to that?
(IMHO, the first behavior made way more sense, as it enabled doing some crazy tree dependencies)