Hello,
Both my project’s debug build and release build works fine - both complete execution, and the release build yields the same result as the debug build
I can run racecheck (cuda-memcheck --tool racecheck) on the project’s release build, and it finishes without error, in good time
But cuda-memcheck itself gets nowhere, and after an infinite amount of time (1 hour) I gave up waiting; the release build generally finishes in under 1 minute
I tried memcheck on both the release build and the debug build
Any ideas why memcheck is seemingly ‘mis-behaving’…?
cuda-memcheck can result in substantially longer execution time for some kernels. It’s doing a variety of things under the hood in order to validate memory accesses, and this can result in a substantial slowdown. It’s not mis-behaving, you just have to wait.
If you think there is a bug, file a bug. It would be best if you provide a short, complete, compilable code that reproduces the problem.
Like I mentioned, I waited an hour
1 minute normal execution time :: 60 minutes execution time with memcheck; that is at least a x60 increase
Are you telling me that memcheck can take that long…?
The factor varies depending on the underlying code. Like I mentioned, if you think its a bug, file a bug.
I ran the code I posted here with cuda-memcheck:
[url]Efficient in-place transpose of multiple square float matrices - CUDA Programming and Performance - NVIDIA Developer Forums
and the cuda code ran 30-45x slower.
Is it possible to run memcheck within the debugger (cuda-gdb), and would kernel launches still be asynchronous?
debugging is a rather slow process, such that memcheck should ‘blend in’; and perhaps this way I can keep a closer eye on memcheck and its progress
Thank you txbob; i’ll have a look