HyperQ and synchronization

In the HyperQ sample, kernel_A and kernel_B are launched in parallel. Then, the kernel “sum” is launched in the last stream without any prior syncronizations.
How can we be sure that the last stream will not be completed before others?
Thank for your reply

The sample just copies the output from a random stream (the last one) - I guess, so you can
later on check (on the CPU) the result is correct.

The important thing is that the timing is indeed done after the sync happens and all
streams have finished. The HyperQ.pdf file also shows the visual profiler output
which supposedly proves it.