asynchronous API with 1.0 compute capabilities


I know that a major change of 1.1 compute capabilities was the introduction of streams and a bunch of asynchronous functions in the API. However, using the 1.0 hardware version, when we have to synchronize with the kernel, it seems that we have to make a blocking call.

Since i guess this function is doing some polling internally (or waits for some interrupts). Is it really impossible that we get a non-blocking version of cuCtxSynchronize for instance ?

Perhaps there is already such a feature that i missed in the documentation ?


I don’t know the driver API function name, but the runtime API function you are looking for is cudaEventQuery(). There must be a driver API counterpart.

And the only addition to the 1.1 compute was to allow Async memcpys and kernel executions to overlap (if in different streams). Both compute 1.0 and 1.1 devices have the same blocking calls for *Synchronize() functions.