Debug vs. Release

I’m working on PGI VS 11.4 with cuda fortran.
My code can run under debug mode and produced reasonable results, but in release mode, it crashed and gave NaN garbage numbers.
Why is there so big difference?

BTW, according to my use of PGI VS, it seems not very stable, it hangs frequently and each time I have to force to close it and then re-open it.

The options used to compile and link your application are, by default, different between the Debug and Release configurations. First, check that any properties that you changed for the Debug config have also been changed for the Release config. Second, the default optimizations of the Release config may be causing your program to execute differently. This may be due to programmatic assumptions or possibly a compiler optimization problem.

With respect to your experiences with PVF hanging frequently, can you describe what you are doing when this behavior occurs? What version of VS are you using? What release of PVF?

In the debug mode, I can run program directly from PVF GUI and a console window will appear to show the running info, but in release mode, if I run from PVF, a console window appears but hangs and I have to close it and reopen PVF. I have to run program from the command line.

As for the problem the release mode didn’t give correct results, I 'm working on a large code using CUDA fortran. In debug mode, it works; but in release mode, if I don’t use fastmath flag, it will give NaN values, but if I check the fastmath flag, it can run but will crash shortly.

I’m struggling this problem for more than a week

Any thoughts or idea about these problems? Many thanks in advance!

Thanks ams’s reply!

I tried ams’s suggestion and checked the configurations of the debug and release mode. By default, the release mode has optimization for speed, but the debug mode doesn’t have. If I toggled off this optimization, the release mode behaves the same as the debug mode, producing correct results with slow speed.

Now the problem becomes: why can’t my program be optimized for speed?

My kernel is a little bit heavy. I’m wondering if the register pressure caused this problem, or some floating error, or others.

Anyone has the same problem as me?