a question about cuda-memcheck

I have a piece of cuda code, it launches a lot of kernel functions. When I run it alone, there are some kernels that don’t launch. But when I run the code with cuda-memcheck, all the kernel functions are launched, and it says there is no CUDA error. Btw, when I use cuda-memcheck, the total time the code use is much longer, I don’t know if this has anything to do with the correctness of the code.
Anyone knows why? I want to know why cuda-memcheck could make the code run correctly. Thx!

cuda-memcheck affects order of execution of warps and blocks. Your code is not supposed to depend on a specific order of execution of these, based on the general cuda programming model.

This behavior also tends to make your code take longer to run.

If you are certain that a kernel is not running, you can use proper CUDA error checking to get an idea of why that may be. If you’re not sure what that is, google “proper CUDA error checking” and take the first hit and start reading.

I personally doubt that this is the case, however, so you’re likely left with an ordinary debug scenario here. Take something that your code is not doing correctly (e.g. an incorrect result) and start working backwards using standard debugging techniques.

cuda-memcheck cannot and does not find all errors in a program.

In particular, it can only catch a subset of race conditions. If race conditions are present, any tiny difference in execution timing can cause observable effects from them to either appear or disappear. As you observed and txbob explains, use of cuda-memcheck does have a noticeable impact on execution timing.

A similar effect can occur with the use of uninitialized data in a program (including the host portion!): Depending on the specifics runtime environment (e.g. with / without cuda-memcheck) such uninitialized data may take different values, causing errors to either manifest themselves or not.

Thank you!

Thank you!