No, it was possible to witness kernel to kernel overlap even on Fermi devices.
Not really. Hyper-Q is a hardware feature of Kepler and beyond, that makes it easier to witness concurrency in general with less restrictive requirements on issue order. Apart from the ordinary requirements for concurrency, there are no special settings to enable this.
txbob said :
“it was possible to witness kernel to kernel overlap even on Fermi devices.”
I can’t understand this meanings.
Please take a look :
=======The followings are cited from NVIDIA <<Hyper-Q Example (2013)>>========
On Fermi, when a CPU thread dispatched work into a CUDA stream,
the work was joined into a single pipeline to the Work Distrsibutor.
The Work Distributor takes work from the front of the pipeline,
checks all dependencies are satisfied, and farms the work to the available SMs.
This means without Hyper-Q, GPU can’t let two or more streams at a moment(NOT logical but physical concurrency) to excute.
So why you said “it was possible to witness kernel to kernel overlap even on Fermi devices.”?
Excuse me,
I think that I’ve misunderstood the expressions in NVIDIA <<Hyper-Q Example (2013)>>.
And I should learn more these documents especially your example.