cuda-memcheck versus cuda-racecheck

Hello,

Both my project’s debug build and release build works fine - both complete execution, and the release build yields the same result as the debug build

I can run racecheck (cuda-memcheck --tool racecheck) on the project’s release build, and it finishes without error, in good time

But cuda-memcheck itself gets nowhere, and after an infinite amount of time (1 hour) I gave up waiting; the release build generally finishes in under 1 minute

I tried memcheck on both the release build and the debug build

Any ideas why memcheck is seemingly ‘mis-behaving’…?

cuda-memcheck can result in substantially longer execution time for some kernels. It’s doing a variety of things under the hood in order to validate memory accesses, and this can result in a substantial slowdown. It’s not mis-behaving, you just have to wait.

If you think there is a bug, file a bug. It would be best if you provide a short, complete, compilable code that reproduces the problem.

Like I mentioned, I waited an hour

1 minute normal execution time :: 60 minutes execution time with memcheck; that is at least a x60 increase
Are you telling me that memcheck can take that long…?

The factor varies depending on the underlying code. Like I mentioned, if you think its a bug, file a bug.

I ran the code I posted here with cuda-memcheck:

https://devtalk.nvidia.com/default/topic/765696/efficient-in-place-transpose-of-multiple-square-float-matrices/#4276119

and the cuda code ran 30-45x slower.

Is it possible to run memcheck within the debugger (cuda-gdb), and would kernel launches still be asynchronous?

debugging is a rather slow process, such that memcheck should ‘blend in’; and perhaps this way I can keep a closer eye on memcheck and its progress

This may be of interest:

http://docs.nvidia.com/cuda/cuda-gdb/index.html#set-cuda-memcheck

Thank you txbob; i’ll have a look