I have the following conceptual questions :

I have the following conceptual questions to ask you:

Q1:
When use Hyper-Q, probably we can have kernels (more than ONE) to be executed at the same moment. Right?

And without the Hyper-Q technology we can only have ONE kernel launching at a moment-----however maybe overlapped with data transfer (GPU-CPU). Right?

Q2:
For making use of Hyper-Q, are there some settings of procedures we should do before?

Q3:
If we employ
“Dynamic Parallelism”, can the sub-kernels and their parent kernel run at the same moment? (Providing GPU resource is enough)

I will highly appreciate your helps !

No, it was possible to witness kernel to kernel overlap even on Fermi devices.

Not really. Hyper-Q is a hardware feature of Kepler and beyond, that makes it easier to witness concurrency in general with less restrictive requirements on issue order. Apart from the ordinary requirements for concurrency, there are no special settings to enable this.

Yes, it’s possible. Here is a worked example:

https://stackoverflow.com/questions/31058850/overlap-kernel-execution-on-multiple-streams/31075799#31075799

Hi ,
Your response is so helpful, thanks very much!

Respectfully,

Hi txbob,
Your response is so helpful, thanks very much!

Respectfully,

txbob said :
“it was possible to witness kernel to kernel overlap even on Fermi devices.”

I can’t understand this meanings.
Please take a look :
=======The followings are cited from NVIDIA <<Hyper-Q Example (2013)>>========
On Fermi, when a CPU thread dispatched work into a CUDA stream,
the work was joined into a single pipeline to the Work Distrsibutor.
The Work Distributor takes work from the front of the pipeline,
checks all dependencies are satisfied, and farms the work to the available SMs.

This means without Hyper-Q, GPU can’t let two or more streams at a moment(NOT logical but physical concurrency) to excute.
So why you said “it was possible to witness kernel to kernel overlap even on Fermi devices.”?

I don’t see anything in there that says that “GPU can’t let two or more streams at a moment”

And I have run experiments with the profiler and witnessed kernel concurrency on Fermi.

Hello txbob,

Excuse me,
I think that I’ve misunderstood the expressions in NVIDIA <<Hyper-Q Example (2013)>>.
And I should learn more these documents especially your example.

I can never thank you enough.