CUDA debug runing is normal, but directly running the program failed

I wrote a program with CUDA
When I directly run the program, it goes wrong.
But when I start cuda debugging, everything is right.
No error is reported.
I really don’t understand!
Does anyone who experience the same?

Without having access to the code, it is difficult to assist with a question like this. When the program “fails” what exactly does it do?

Two possible scenarios, off the top of my head:

(1) The code may invoke undefined C++ behavior
(2) The code may contain a race condition

What happens when you execute the code under control of cuda-memcheck? Are any errors reported?