Fewer concurrent kernels with Hardware Accelerated GPU Scheduling (HAGS)

Thx Greg.

I changed the environment variable as you suggested and it worked.

I was also curious about whether the concurrency of kernels launched dynamically into cudaStreamFireAndForget was limited by CUDA_DEVICE_MAX_CONNECTIONS.

I found that it was not. Even with CUDA_DEVICE_MAX_CONNECTIONS left at the default (8),
I could easily get 32 concurrent kernels by mixing host and dynamic launches.
This makes sense in light of this post that explains a little more about connections:
How Many Streams?

Thx again.