Streams and CPU

ElGuapo_Oficial · September 25, 2013, 4:32am

Hi

I’m trying to work on the CPU with the data generated by the kernel… that is, after copying data from the device to the host, and resend the kernel to generate more data…

This process is on a loop.

This is the algorithm but it doesn’t seem to work…

while (true)
{

cudaMallocs() // asking for memory
cudaDeviceSynchronize() // making sure everything is synchronized
cudaEventRecord(start) // for taking time purposes
call_async_Kernel(b, t, 0, 0)
cudaEventRecord(stop)
cudaMemcpyAsync(cudaMemcpyDeviceToHost)
while (cudaEventQuery(stop) == cudaErrorNotReady)
if (data_Available)
Work_on_CPU()
cudaFree()
}

Maybe this is not the proper way to do it, any ideas?

Thanks in advance!

Tobbey · September 27, 2013, 11:40am

You may be interested in using cuda streams:

Create a stream, push all your work in an async way into this stream, and, right after the cudaMemcpyAsync that take data back to host, use a cudaStreamSynchronize:

/* Create stream : equivalent to a work queue */
cudaStream_t stram;
cudaStreamCreate(stream);

/*Perform memory allocation outside of the loop if it is possible because this operations is expensive in terms of time */

cudaMallocHost(); //Ask for pinned host memory for faster memcopies
cudaMallocDevice(); //device memory

while (true)
{
cudaMemcpyAsync(inputDevPtr , hostPtr , size, cudaMemcpyHostToDevice, stream);
cudaEventRecord(start)
MyKernel <<<64, 64, 0, stream>>>(b, t, 0, 0);
cudaEventRecord(stop)
cudaMemcpyAsync(hostPtr , outputDevPtr , size, cudaMemcpyDeviceToHost, stream);
cudaStreamSynchronize(stream);
do_work_oncpu()
}

cudafree(devicememory);
cudafreehost(hostmemory);

cudaStreamDestroy(stream);

This way, you will be able to synchronize operation properly, even if you decide to do the same kind of loop in an other cpu thread that will use an other cuda stream.
But if you want to perform the (n+1)th cuda computation in parallel with the (n)th cpu computation, you will need to use more than 1 stream, and preferably more than 1 cpu thread.

Topic		Replies	Views
asyncAPI sample question CUDA Programming and Performance	9	5211	December 18, 2007
I want to synchronize CUDA streams CUDA Programming and Performance	4	1055	January 5, 2024
Questions on Streams CUDA Programming and Performance	5	2264	July 16, 2008
Question about CUDA streams CUDA Programming and Performance	8	884	November 8, 2019
Overlap cudaMemcpyAsync and kernel CUDA Programming and Performance	1	554	February 10, 2021
Do i really need to use cudaDeviceSynchronize in this scenario ? CUDA Programming and Performance	2	1111	February 11, 2019
Parallel execution of GPU and CPU functions using streams CUDA Programming and Performance	7	49549	January 20, 2011
How to implement calculation pipeline via CUDA streams ? CUDA Programming and Performance	3	6683	January 17, 2013
About Stream control CUDA Programming and Performance	1	994	March 26, 2009
Syncronization with cuda Streams CUDA Programming and Performance cuda	7	598	April 13, 2021

Streams and CPU

Related topics