Async calls concurrent from different contexts? asynchronous, driver API

Are the *Async family of driver API calls executed concurrently across different contexts?

I’m assuming here the usual assumptions: memory is pinned, the different contexts are bound to different threads, and that kernel executions themselves are still serialized on the same physical device (i.e., only I/O can execute concurrently with other I/O and kernel executions).

Let’s say I have context A and context B, both bound to different threads but each on the same CUdevice. Will the I/O from say, A, overlap with the I/O and kernels from B? What I’m getting at is if I use a single CUstream for A and a single CUstream from B, will I still benefit from concurrency?

The way it might work, which would be unfortunate, is for streams to operate concurrently only within the same context. That is, I’d need at least two CUstreams on A to overlap the I/O, and two CUstreams on B. That would be disappointing.

Again, I need to emphasize here that I assume both contexts bind to the same physical device. I know you benefit from concurrency across physical boards.

Thanks.

I assume they are, but I haven’t tested this very thoroughly. I’ll ask around.