Visual Studio default 6.5 CUDA project memory allocations fail on debug NOT release!

I have made my CUDA program using the standard CUDA configuration in visual studio 2013.

I can only run my program when I compile in release, not in debug.

All memory allocations fail when I run the program in debug mode.

Do you have code that reproduces this behavior that you can provide?

Off the top of my head, I can think of two possible scenarios: (1) Device code can run much slower for a debug build compared to a release build. This causes time-out in some kernel followed by watchdog reset, resulting in failure of all subsequent CUDA API calls. (2) More memory is required for the debug build, for example the code size may increase compared to the release build, causing the app’s memory allocation to fail.

These are just two working hypotheses, without repro code and auxiliary information such as compiler switches used for the build, CUDA version, driver version, GPU used it is anybody’s guess as to what may be happening.

I can not provide my code: it is a closed source project.
My code runs, I have no errors, no memory leaks. CUDA debugging shows no errors.
Below is the only output of NSight CUDA debugging.

I can’t allocate any memory of any size when I try to use C++ debug mode: the function cudaMalloc fails.

The “generate GPU compute debug information” creates the problem. I am using the latest toolkit and driver, using code generation for compute_52 & sm_52. I am using a GTX 970.

CUDA context created : 00299fd0
CUDA module loaded:   067150e0 C:/Users/*******/*******/*********/Kernels.cu