one asynchronous kernel start needs about 10 Âµs host time, how much time need N kernel starts?
Does it scale with N, so it results in (10Âµs * N) host time for N kernels?
Is there a maximum for N after that the host time increases non-linear?
If there is maximum for N, where can I look it up? (device properties, specification, experimental, … )
I tried to start different numbers of asynchrounous kernels at the same time in a for loop and measured the time for starting on host.
For about N<1000 asynchronous kernel starts the requered host time is N*10Âµs
For N>1000 starts the host time increases non-linear (very fast) and for N = 1100 the host time is nearly equal to kernel run-time.
The problem seems to be very trivial but I could not find any answers yet.
I guess that the GPU has a kernel scheduler which schedules about 1024 kernels at the same time.
GeForce GTX 470 (compute capability 2.0)