Is there a way to have one stream wait on one or more other streams to finish before it continues?
For example, for some operation C = A + B, suppose A and B were being computed via two separate streams. I would like to queue the computation of C to wait until both A and B are completed. I would like to not block at all, but simply queue up all these computations (with possibly associated memcpy in their streams), and return control to the CPU. Then when I finally need C, I would block on its stream.
Are streams even appropriate here?