question on asyncAPI.cu

i am trying to run CPU call and cuda call simultaneously.
here is one solution i got from this forum.

cuda_kernel<<<dimGrid,dimBlock>>>(C,D);
an_extern_c(A,B);
cudaThreadSynchronize();

then i found this asyncAPI example. here is copy of it.

// asynchronously issue work to the GPU (all to stream 0)
cutilCheckError( cutStartTimer(timer) );
    cudaEventRecord(start, 0);
    cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);
    increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value);
    cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);
    cudaEventRecord(stop, 0);
cutilCheckError( cutStopTimer(timer) );

// have CPU do some work while waiting for stage 1 to finish
unsigned long int counter=0;
while( cudaEventQuery(stop) == cudaErrorNotReady )
{
    counter++;
}
cutilSafeCall( cudaEventElapsedTime(&gpu_time, start, stop) );

to me the key part; ‘cudaMemcpyAsync’ and the consition statement for while loop.

do i need that ‘cudaMemcpyAsync’ for my purpose? can i use ‘cudaMemcpy’?

what is the role of the condition statement for while loop?

could you explain a liitle more about how to do simultaneous run of CPU and cuda calls?

Thanks in advance and any comments are welcome…

  1. Yes, you can use cudaMemcpy instead of cudaMemcpyAsync. Yet cudaMemcpuAsync frees your CPU to do other stuffs during data copy. It’s up to you.

  2. The role of the condition statement for while loop is to check whether the GPU has finished launching the kernel or not, since now your GPU and CPU are running asynchronously.

Hope it helps. =)

Billy