M new in CUDA, we are writing code with Kernel, and want to know if after offloading the Kernel,
1)if CPU/GPU are concurent where does the control return at points where it leave or to the points where cpu is currently executing
2)How do i know that execution of my kernel is executed completely, Before completion if I copied the results wont’s be right???/
Kernel calls are asynchronous. They will return immediately after the kernel is started.
CPU and GPU can be synchronized with CudaThreadSynchronize() calls.
Memory copys (regular ones) implicitly call CudaThreadSynchronize() to ensure the GPU finished all calculations.
With CUDA 1.1 a streaming API was introduced that has some additional features and allows memcopys from and to GPU to be parallelized with kernel execution (but you need a compute capability 1.1 card to do that).