question on asyncAPI.cu

syoon · February 11, 2011, 5:59pm

i am trying to run CPU call and cuda call simultaneously.
here is one solution i got from this forum.

cuda_kernel<<<dimGrid,dimBlock>>>(C,D);
an_extern_c(A,B);
cudaThreadSynchronize();

then i found this asyncAPI example. here is copy of it.

// asynchronously issue work to the GPU (all to stream 0)
cutilCheckError( cutStartTimer(timer) );
    cudaEventRecord(start, 0);
    cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);
    increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value);
    cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);
    cudaEventRecord(stop, 0);
cutilCheckError( cutStopTimer(timer) );

// have CPU do some work while waiting for stage 1 to finish
unsigned long int counter=0;
while( cudaEventQuery(stop) == cudaErrorNotReady )
{
    counter++;
}
cutilSafeCall( cudaEventElapsedTime(&gpu_time, start, stop) );

to me the key part; ‘cudaMemcpyAsync’ and the consition statement for while loop.

do i need that ‘cudaMemcpyAsync’ for my purpose? can i use ‘cudaMemcpy’?

what is the role of the condition statement for while loop?

could you explain a liitle more about how to do simultaneous run of CPU and cuda calls?

Thanks in advance and any comments are welcome…

yatshun · February 12, 2011, 7:48am

Yes, you can use cudaMemcpy instead of cudaMemcpyAsync. Yet cudaMemcpuAsync frees your CPU to do other stuffs during data copy. It’s up to you.
The role of the condition statement for while loop is to check whether the GPU has finished launching the kernel or not, since now your GPU and CPU are running asynchronously.

Hope it helps. =)

Billy

i am trying to run CPU call and cuda call simultaneously.

here is one solution i got from this forum.

cuda_kernel<<<dimGrid,dimBlock>>>(C,D);

an_extern_c(A,B);

cudaThreadSynchronize();

then i found this asyncAPI example. here is copy of it.

// asynchronously issue work to the GPU (all to stream 0)
cutilCheckError( cutStartTimer(timer) );

    cudaEventRecord(start, 0);

    cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);

    increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value);

    cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);

    cudaEventRecord(stop, 0);

cutilCheckError( cutStopTimer(timer) );
// have CPU do some work while waiting for stage 1 to finish
unsigned long int counter=0;

while( cudaEventQuery(stop) == cudaErrorNotReady )

{

    counter++;

}

cutilSafeCall( cudaEventElapsedTime(&gpu_time, start, stop) );
to me the key part; ‘cudaMemcpyAsync’ and the consition statement for while loop.

do i need that ‘cudaMemcpyAsync’ for my purpose? can i use ‘cudaMemcpy’?

what is the role of the condition statement for while loop?

could you explain a liitle more about how to do simultaneous run of CPU and cuda calls?

Thanks in advance and any comments are welcome…

Topic		Replies	Views
asyncAPI sample question CUDA Programming and Performance	9	5041	December 18, 2007
Asynchronous execution of kernels CUDA Programming and Performance	1	3017	July 10, 2008
Do the non-async calls sleep or burn CPU? CUDA Programming and Performance	20	22055	January 13, 2008
some cuda question CUDA Programming and Performance	6	980	December 23, 2015
CPU blocked MUCH longer than expected calling a cudaMemcpy after a cuda graph launch CUDA Programming and Performance	7	574	October 19, 2023
cudaMemcpy() Best approach when you need to call it many times? CUDA Programming and Performance	8	25117	March 8, 2010
Asynchronous performance between CPU and GPU CUDA Programming and Performance	3	2385	June 18, 2012
Memory copy/set async to kernel execution in different stream CUDA Programming and Performance	5	1096	December 15, 2022
during the copy, can cpu and gpu work? CUDA Programming and Performance	6	5214	June 11, 2008
processing time check CUDA Programming and Performance	5	551	November 16, 2010

question on asyncAPI.cu

Related topics