STREAMS

susan7 · November 8, 2009, 1:54pm

Hello,

I want to increase the performance of my program by overlapping memory copy and kernel using Nstreams as follow:-

threads=dim3(512,1);
blocks=dim3(n/(nstreams*threads.x),1);
memset(a, 255, nbytes); // set host memory bits to all 1s, for testing correctness
cudaMemset(d_a, 0, nbytes); // set device memory to all 0s, for testing correctness
cudaEventRecord(start_event, 0);
for(int k = 0; k < nreps; k++)
{
// asynchronously launch nstreams kernels, each operating on its own portion of data
for(int i = 0; i < nstreams; i++)
init_array<<<blocks, threads, 0, streams[i]>>>(d_a + i * n / nstreams, d_c, niterations);

// asynchronoously launch nstreams memcopies. Note that memcopy in stream x will only
// commence executing when all previous CUDA calls in stream x have completed
for(int i = 0; i < nstreams; i++)
cudaMemcpyAsync(a + i * n / nstreams, d_a + i * n / nstreams, nbytes / nstreams, cudaMemcpyDeviceToHost, streams[i]);
}
cudaEventRecord(stop_event, 0);
cudaEventSynchronize(stop_event);
CUDA_SAFE_CALL( cudaEventElapsedTime(&elapsed_time, start_event, stop_event) );
printf(“%d streams:\t%.2f (%.2f expected with compute capability 1.1 or later)\n”, nstreams, elapsed_time / nreps, time_kernel + time_memcpy / nstreams);

Thats for one kernel and one copy, could any one tell me how to do it on many copies and many kernels, shall I loop over the copies and then loop over the kernels? and increase the ‘#0’ in execution configuration in kernel2 by 1 or how?

Thank you for help

Topic		Replies	Views
concurrent copy and execution CUDA Programming and Performance	0	1613	November 6, 2009
CUDA stream CUDA Programming and Performance	1	4651	April 11, 2010
Timing With Streams CUDA Programming and Performance	0	1720	October 2, 2008
confusions about CUDA streams CUDA Programming and Performance	5	811	July 30, 2017
Problem using streams Can't get more than one stream to work CUDA Programming and Performance	3	4663	October 8, 2008
How to implement calculation pipeline via CUDA streams ? CUDA Programming and Performance	3	6543	January 17, 2013
Help with CUDA streams CUDA Programming and Performance	1	1599	April 2, 2010
Weird behaviour of CUDA streams CUDA Programming and Performance	0	1890	June 17, 2010
Streams and CPU CUDA Programming and Performance	1	1034	September 27, 2013
:rolleyes: wath Gain using stream? code with stream take more time to execute, wath is the gain of s CUDA Programming and Performance	3	7181	February 12, 2010

STREAMS

Related topics