vector data with and without cuda-memcheck


I am running a program and I print a temporary vector. When I run the example without cuda-memcheck I get wrong values printed, but when I use cuda-memcheck I get them right. Any ideas why this might happen?

Thank you!

Are you printing from host or device code?

Are you sure the kernel calls or memcpy operation has finished before you start printing your data?
(some of the CUDA APIs are asynchronous. This is something to watch out for)

Maybe you can post some repro code? There more complete it is (ideally I can just compile and run the code as is), the more likely it is you will find someone willing to look into it.

cuda-memcheck affects the execution speed a lot (things run much more slowly under cuda-memcheck), so the timing of operations will be much different. So maybe running under cuda-memcheck hides a race condition that affects the result when the code runs normally.