Processing 8 stream at the same time not 32 stream using Hyper Q on K20

- I used K20 with Hyper-Q function on Windows 7(Visual Studio)
- I have done like below example

- I heard that K20 can use 32 streams at the same time.
- We can work only 8 streams at the same time.

- Is there any way to use 32 streams at the same time?
- Is it common to see this kind of result? If it is, please tell the technical reason.

for(int i=0 ; i<nStreams ; i++)
          checkCudaErrors(cudaMemcpyAsync(dev_fimage, fimage, nx * ny * sizeof(float), cudaMemcpyHostToDevice, streams[i]));

for(int i=0 ; i<nStreams ; i++)
          Kernel 1<<<dimGrid_P_ovs, dimBlock, 0, streams[i]>>>(dev_fimage);
          Kernel 2<<<dimGrid_P_ovs, dimBlock, 0, streams[i]>>>(dev_fimage);
          Kernel 3<<<dimGrid_P_ovs, dimBlock, 0, streams[i]>>>(dev_fimage);

for(int i=0 ; i<nStreams ; i++)
          checkCudaErrors(cudaMemcpyAsync(fimage, dev_fimage, nx * ny * sizeof(float), cudaMemcpyDeviceToHost, streams[i]));

Does the simpleHyperQ sample app work for you? I have not tried it but looking at the source it seems to use 32 streams by default.


It seems you may need to set the environment variable CUDA_DEVICE_MAX_CONNECTIONS=32 to bump the number of channels to the maximum of 32 supported by the hardware. From what I understand, this is not necessarily the driver default in order to preserve internal resources.

How I can set the environment variable CUDA_DEVICE_MAX_CONNECTIONS=32 ???
I am working on Windows.

Windows also uses environment variables. There are various ways to manipulate them. For example, at the command prompt:


to unset it later:


You can set environment variables in a persistent fashion through the control panel:

System and Security -> System -> Advanced System Settings -> Environment Variables