cuda-memcheck (with cc 2.x) and synchronous execution


I am new to Cuda.I have a problem with my code: I suspect to have some kind of problem related to the asynchronous kernel execution of two new kernel in my code.

Reading the Cuda Design Guide (rf., v.July 2013):
“Kernel launches are synchronous in the following cases:
a-The application is run via a debugger or memory checker (cuda-gdb,cuda-memcheck,Nsight) on a device of compute capability 1.x;
b-Hardware counters are collected via a profiler (Nsight, Visual Profiler”
is (a) true only for device of compute capability 1.x?
I am working with a dev of cc 2.x. If I debug with cuda-memcheck, are the kernel executed in synchronous or asynchronous way?
Many thanks for any help!

Hi Jim,

The information you have posted above is correct. When debugging with cuda-gdb or cuda-memcheck on devices with compute capability 2.x and above, kernel launches will be asynchronous by default (you can also debug concurrent kernels on these devices).

However, if you enable cuda-memcheck in integrated mode within cuda-gdb (by executing the “set cuda memcheck on” command), this will currently force launches to be synchronous. We are looking at changing this in a future release.

If you wish to force blocking launches in standalone cuda-memcheck, you can specify the “–force-blocking-launches yes” option.

If you wish to force blocking launches in cuda-gdb, you can execute the “set cuda launch_blocking on” command.

Hi geoffg,

many thanks for your hints, it is helpful!