Problem using streams Can't get more than one stream to work

I’m trying to use streams to run the same kernel on several sets of data, but can’t seem to get it to work when I set the number of streams to greater than one.

I’ve extracted the relevant portions of the code below.

 // Setup event handles

  cudaEvent_t start_event, stop_event;



  float total_time;

 // allocate and initialize an array of stream handles

  cudaStream_t *streams = (cudaStream_t*) malloc(bigBlocks * sizeof(cudaStream_t));

  for (int i=0; i < nstreams; i++)




 // Copy to device




  for (int i = 0; i < nstreams; i++)


      // Process data    


      CUT_CHECK_ERROR("Kernel execution failed.");






When I run the above with nstreams >1, the execution time is 0.000000, which seems to indicate that the kernel hasn’t launched. When nstreams = 1, the execution time is ~8.

I can get simpleStreams from the SDK to run fine.

Is there something I’m missing? Are there limits to the number of streams that can run at one time?

I think my previous post was in the wrong forum…

In any case, my bad. I was modifying the pointers into the matrices wrong, incrementing by actual memory size versus number of elements.

host data must defined with

cudaMallocHost((void**)&T, data_size*nstreams);

cudaMallocHost((void**)&a, data_size*nstreams);
// Copy to device



in this case you must use function


if you want to use only one stream for copying data from host to device

cudaMemcpyAsync(Td,T,data_size*nstreams,cudaMemcpyHostToDevice, streams[0]);

cudaMemcpyAsync(d,a,data_size*nstreams,cudaMemcpyHostToDevice, streams[0]);

if you use more stream

for(int i  = 0; i < nstreams; i++)


cudaMemcpyAsync(Td + i*numOfElementsPerStream,T + i*numOfElementsPerStream, data_size, cudaMemcpyHostToDevice,streams[i]); 

cudaMemcpyAsync(d+ i*numOfElementsPerStream, a + i*numOfElementsPerStream, data_size, cudaMemcpyHostToDevice,streams[i]);  


numOfElementsPerStream: how many elements for one stream

in your case numOfElementsPerStream = data_size/sizeof(Type)

good luck to you :)

I’ve got a similar problem, in that the streams dont seem to be working.

Can anyone provide any help?