Hello,

one asynchronous kernel start needs about 10 Âµs host time, how much time need N kernel starts?

Does it scale with N, so it results in (10Âµs * N) host time for N kernels?

Is there a maximum for N after that the host time increases non-linear?

If there is maximum for N, where can I look it up? (device properties, specification, experimental, … )

I tried to start different numbers of asynchrounous kernels at the same time in a for loop and measured the time for starting on host.

For about N<1000 asynchronous kernel starts the requered host time is N*10Âµs

For N>1000 starts the host time increases non-linear (very fast) and for N = 1100 the host time is nearly equal to kernel run-time.

The problem seems to be very trivial but I could not find any answers yet.

I guess that the GPU has a kernel scheduler which schedules about 1024 kernels at the same time.

Device:

GeForce GTX 470 (compute capability 2.0)