Question on Stream, Connection and Performance

,

I have no answers, but some related points:

  1. In addition to the limits you mention, there is a limit on the number of resident grids per device.
    See Table 18.

  2. If you use dynamic parallelism, you can easily reach the grid limit using only one stream created by the host.

An approach I have been exploring recently:

  • create only a small number of streams on the host, e.g., one or two streams for each desired stream priority
  • use a combination of host launches and dynamic parallelism to achieve the concurrency you want

In addition to the links you posted, beware of Increased time to synchronize….