Ptxas fatal: Memory allocation failure


I am trying to compile my cuda file and it gives me the following error: Ptxas fatal: Memory allocation failure. I am using a 64-bit application (debug mode) under Visual Studio 2017. As i’m using a Nvidia Quadro P2000, i put compute_61, sm_61.

I tried the release mode which works fine !

Do you know the reason of the issue when trying to compile on debug mode.


I have never encountered this. The message seems to indicate pretty clearly that PTXAS (the optimizing compiler that translates the PTX intermediate representation into machine code) requested a dynamic memory allocation which failed. Release builds use full optimization, while debug builds use no optimization whatsoever. The input to PTXAS can therefore differ a lot between debug and release builds, as can the output. The size of the code generated for a debug build could be larger or smaller than for a release build.

Hypothesis 1: There is extraordinarily little system memory available when PTXAS runs
Hypothesis 2: PTXAS needs an extraordinary large amount of memory to do its work

(1) Are you able to reproduce the issue reliably? How much system memory is available when PTXAS runs? What is the total amount of system memory on the machine used to compile the code?

(2) How large is the CUDA source code? How large is the PTX code being passed to PTXAS (how many lines, how many kilobytes)? How long does PTXAS run before it fails with the “failed allocation” error? How long does it run in the corresponding release mode build? When you monitor PTXAS memory usage while it runs, how much memory does it use?

If PTXAS runs for a very long time in the debug build (say more than twice as long as for the release build, or more than 10 minutes) before it fails, and you can observe a continuously increasing memory usage of PTXAS during that time, this would be a good indication of a memory leak, infinite loop, or other bug within PTXAS. In which case you would want to file a bug report with NVIDIA.

Hi njuffa,

I see now that it’s memory usage issue

  1. I have 15,9GB on my system memory. Once the compiler is with CUDA, the used memory goes from 4.4GB to 15.4GB in matter of 3 seconds lol. So this memory usage will stay constant until the compilation failure.

  2. I have around 35000 lines of cuda code (Size = 2761KB). It takes 1-2 hours before it fails.

That is 35 KLOC for a single kernel? And this kernel takes 1+ to compile, and then PTXAS blows up?

While that is large as CUDA kernels go, it should probably not cause PTXAS to chew through all your memory and then blow up with a failed allocation. Consider filing a bug with NVIDIA so the compiler folks can have a look whether PTXAS is using more memory than it should. The lengthy compilation time also seems indicative of a problem. I would expect the code to compile in maybe 10 to 15 minutes. How long does your release build take to compile the same code?

Yes. I have a very large kernel around 30.000 lines. I just put it in comment and so as it will be not considered as part of source code, the compilation went through right away. For release mode, yes the compilation takes around 10-15 minutes

On second thought, the long compilation time may simply be a side effect of the memory usage as the system starts swapping before it runs out of memory.

I assume this is some sort of generated code, since I can’t imagine a human writing a 30,000 line kernel.

lol I spent at least one year for this kernel. Anyway, I’ll find a way to debug.

Thanks for your help !