Processing 8 stream at the same time not 32 stream using Hyper Q on K20

Situation
- I used K20 with Hyper-Q function on Windows 7(Visual Studio)
- I have done like below example

Problem
- I heard that K20 can use 32 streams at the same time.
- We can work only 8 streams at the same time.

Question
- Is there any way to use 32 streams at the same time?
- Is it common to see this kind of result? If it is, please tell the technical reason.

for(int i=0 ; i<nStreams ; i++)
{
          checkCudaErrors(cudaMemcpyAsync(dev_fimage, fimage, nx * ny * sizeof(float), cudaMemcpyHostToDevice, streams[i]));
}

for(int i=0 ; i<nStreams ; i++)
{
          Kernel 1<<<dimGrid_P_ovs, dimBlock, 0, streams[i]>>>(dev_fimage);
          Kernel 2<<<dimGrid_P_ovs, dimBlock, 0, streams[i]>>>(dev_fimage);
          Kernel 3<<<dimGrid_P_ovs, dimBlock, 0, streams[i]>>>(dev_fimage);
          …….
}

for(int i=0 ; i<nStreams ; i++)
{
          checkCudaErrors(cudaMemcpyAsync(fimage, dev_fimage, nx * ny * sizeof(float), cudaMemcpyDeviceToHost, streams[i]));
}

Does the simpleHyperQ sample app work for you? I have not tried it but looking at the source it seems to use 32 streams by default.

[later:]

It seems you may need to set the environment variable CUDA_DEVICE_MAX_CONNECTIONS=32 to bump the number of channels to the maximum of 32 supported by the hardware. From what I understand, this is not necessarily the driver default in order to preserve internal resources.

How I can set the environment variable CUDA_DEVICE_MAX_CONNECTIONS=32 ???
I am working on Windows.

Windows also uses environment variables. There are various ways to manipulate them. For example, at the command prompt:

set CUDA_DEVICE_MAX_CONNECTIONS=32

to unset it later:

set CUDA_DEVICE_MAX_CONNECTIONS=

You can set environment variables in a persistent fashion through the control panel:

System and Security → System → Advanced System Settings → Environment Variables