Is it possible to run a kernel with CUDA memcheck enabled by my program and then disable it
and run the kernel again inside the same program? The CUDA memcheck manual says it is
possible to do this via the debugger but I would like to do this non-interactively.
The idea is to run a new version of the kernel with error detection turned on.
Then, if all is well, run it again without memcheck. The second time it should run
much faster and I can gather realistic performance stats. I guess if this works,
I could also use CUDA racecheck but memcheck is the immediate concern.
Give the kernel only takes a few milliseconds the overhead of starting the whole
program a second time is considerable. If this works, I would like to run (sequentially)
many kernels from within the same (host) program.
Any comments, ideas, suggestions are welcome.