Dear All

Could someone explain the difference of treatment the streams in a GPU with Hyper-Q (>= cap. 3.5) and without (< cap 3.5).

Is serial the execution inside each stream in both cases? Then if I have only kernels calls in the host I do not need synchronization inside each stream. Is that way?


Luis Gonçalves

All CUDA operations issued to a given stream are processed serially. This has no dependence on hyper-Q or compute capability. Hyper-Q makes it more likely that kernels from different streams can run concurrently, because it removes some artificial dependencies that may be introduced resulting from the single command queue of pre-cc3.5 devices.

This Question/Answer may be of interest:


note the comments by Greg Smith.

How I program the concurrency (Depth first, Breadth first or Custom)?