I wrote a piece of code to sort an array of arbitrary size. I run the code over different problem sizes and some of my problem sizes(usually the large ones) will not sort properly. I think it must be data races since every time it happens on a different input. But the problem is when I run my code with cuda-memcheck, I don’t get any error and it sorts all my input perfectly. I ran my code several times on different inputs without a single error. So I was wondering how cuda-memcheck run executables that I get a correct answer?