Hi,
As per my knowledge, the number of cuda kernels called from the host code, will be executed in parallel.
How to make them execute in sequential? i.e. kernel2 must execute after the completion of kernel1.
thanks
Hi,
As per my knowledge, the number of cuda kernels called from the host code, will be executed in parallel.
How to make them execute in sequential? i.e. kernel2 must execute after the completion of kernel1.
thanks
Section 3.2.5.1 of the CUDA 4.1 Programming Guide says:
Programmers can globally disable asynchronous kernel launches for all CUDA applications running on a system by setting the CUDA_LAUNCH_BLOCKING environment variable to 1. This feature is provided for debugging purposes only and should never be used as a way to make production software run reliably.
See: http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
Kernels are automatically executed sequentially if they are launched in the same stream (which may just be the default stream, if you haven’t specified any).