I have never encountered this. The message seems to indicate pretty clearly that PTXAS (the optimizing compiler that translates the PTX intermediate representation into machine code) requested a dynamic memory allocation which failed. Release builds use full optimization, while debug builds use no optimization whatsoever. The input to PTXAS can therefore differ a lot between debug and release builds, as can the output. The size of the code generated for a debug build could be larger or smaller than for a release build.
Hypothesis 1: There is extraordinarily little system memory available when PTXAS runs
Hypothesis 2: PTXAS needs an extraordinary large amount of memory to do its work
(1) Are you able to reproduce the issue reliably? How much system memory is available when PTXAS runs? What is the total amount of system memory on the machine used to compile the code?
(2) How large is the CUDA source code? How large is the PTX code being passed to PTXAS (how many lines, how many kilobytes)? How long does PTXAS run before it fails with the “failed allocation” error? How long does it run in the corresponding release mode build? When you monitor PTXAS memory usage while it runs, how much memory does it use?
If PTXAS runs for a very long time in the debug build (say more than twice as long as for the release build, or more than 10 minutes) before it fails, and you can observe a continuously increasing memory usage of PTXAS during that time, this would be a good indication of a memory leak, infinite loop, or other bug within PTXAS. In which case you would want to file a bug report with NVIDIA.