i am trying to run CPU call and cuda call simultaneously.
here is one solution i got from this forum.
cuda_kernel<<<dimGrid,dimBlock>>>(C,D);
an_extern_c(A,B);
cudaThreadSynchronize();
then i found this asyncAPI example. here is copy of it.
// asynchronously issue work to the GPU (all to stream 0)
cutilCheckError( cutStartTimer(timer) );
cudaEventRecord(start, 0);
cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);
increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value);
cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);
cudaEventRecord(stop, 0);
cutilCheckError( cutStopTimer(timer) );
// have CPU do some work while waiting for stage 1 to finish
unsigned long int counter=0;
while( cudaEventQuery(stop) == cudaErrorNotReady )
{
counter++;
}
cutilSafeCall( cudaEventElapsedTime(&gpu_time, start, stop) );
to me the key part; ‘cudaMemcpyAsync’ and the consition statement for while loop.
do i need that ‘cudaMemcpyAsync’ for my purpose? can i use ‘cudaMemcpy’?
what is the role of the condition statement for while loop?
could you explain a liitle more about how to do simultaneous run of CPU and cuda calls?
Thanks in advance and any comments are welcome…