Hi,
I’ve got a problem concerning kernel launches. In all documentations, they are said to be asynchronous. But for me, this doesn’t seem to be true:
[codebox]printf(“Kernel launch: %d blocks x %d threads, %d bytes shared mem/block\n”, dimGrid.x, dimBlock.x, s_mem_size);
encode_cb_kernel<<< dimGrid, dimBlock, s_mem_size >>>
(rawcbs_d, cbs_d, n_cbs, slope_max_d, pic->xSize, mode, enable_pcrd,
global_buf_d, global_buf_ofs_d);
printf(“launched!\n”);
CUDA_SAFE_CALL(cudaThreadSynchronize()); //wait…
printf(“sync’ed!\n”);[/codebox]
After launching the kernel, program execution doesn’t return to CPU! There is a delay of 2 seconds, then “launched” and immediately “sync’ed” are printed. The same appears when debugging the kernel launch. When I “step over” the kernel launch, you can also see that the CPU program sleeps for 2 secs.
My system: WinXP Prof., VC++ 2005, CUDA SDK 2.1, GeForce 8600 GT
Thanks for your help, Martin