CUDA V7.0 Release Mode Compile error: nvcc error : 'ptxas' died with status 0xC0000005 (ACCESS_VIOL

I am getting a cuda compile error in release mode only, not in debug mode with the latest CUDA V7.0 compiler.
CUDACOMPILE : nvcc error : ‘ptxas’ died with status 0xC0000005 (ACCESS_VIOLATION) on CUDA V7.0

I have chased the issue down to a single line of code in a device function: “sum += fx * px;” (all values are floats). If I change the “+=” to simply “=”. The code will compile and run in release mode. The code compiles and runs in debug mode and operates properly…

The device function is fairly complex and is part of a fairly complex kernal code for computing a neural network I/O. I know “+=” processes work just fine in other portions of the code… so the issue is not simply that I am using a “+=”. I have also changed the code to be: “sum = sum + fx * px”.

I have another device function in the same code base that is causing the same Cuda compile error in release mode only. The code compiles and operates properly in debug mode… so I assume the issue is related to a code optimization process in release mode. Below is the entire device function, which calls other functions. As I have stated… the only thing I have to change to get the code to compile in the release mode is to change the “+=” to simply “=” on the line “sum += fx * px;”

I would appreciate any help or hints as to what I might be doing wrong.

float NeuralNetwork_Cuda::waveletInnerProd(int nnOutpLvl, NeuralNetwork_Cuda &ras, SmVecCuda_f &xc, int rasOutpLevel)
float sum = 0.0;
SmVecCuda_f xnRas = ras.x_bar(xc);
SmVecCuda_i Icenter = xn_to_index(xnRas); //Center Index

SmVecCuda_f xr(nnStr->N);
float dxRas = 1.0;
float dxNN = 1.0;
for(int i = 0; i < nnStr->N; ++i )
	xr.Vec[i] = nnStr->del_xVec[i] * nnStr->Nn_comp;
	dxNN *= nnStr->del_xVec[i];
	dxRas *= ras.nnStr->del_xVec[i];
float dxSf = dxRas / dxNN;

SmVecCuda_f tmp = xc + xr;
xnRas = ras.x_bar(tmp);
SmVecCuda_i Imax = xn_to_index(xnRas);
tmp = xc - xr;
xnRas = ras.x_bar(tmp);
SmVecCuda_i Imin = xn_to_index(xnRas);
SmVecCuda_i Idx = Imin;
	for(int i = 0; i < nnStr->N; ++i )
		xnRas.Vec[i] = (float)(Idx.Vec[i] - Icenter.Vec[i]) * ras.nnStr->del_xVec[i] * nnStr->x_normVec[i];
	float fx = nnNodeOutpFn(xnRas, nnOutpLvl);		
	float px = ras.getDiscreteOutputAtIndex(Idx, rasOutpLevel);
	sum += fx * px;  //ToDo:  sum += ... is causing an error (nvcc error : 'ptxas' died with... 
while( ras.inc_index(Idx, Imin, Imax) );
sum *= dxSf;
return sum;


I would suggest filing a bug report using the form linked from the CUDA registered developer website. No matter what source code is handed to the compiler, abnormal termination of a compiler component (here: PTXAS) because segmentation fault / access violation should not occur and is an internal compiler error.

Assuming as a working hypothesis that the problem is related to optimization, you could try reducing the optimization level of PTXAS, which by default is -O3. So as a workaround, I would suggest trying -Xptxas -O2, and if that doesn’t fix the issue reduce further to -Xptxas -O1.


I am having this same problem on the latest SDK (v8.0). Unfortunately I can’t release my code and I’m too new to Cuda to attempt to reproduce the problem with simpler code at this time, so this is a bit of a long shot.

Basically, I’m porting c code to a multi cu file build in Visual Studio 2015. Some of the files are causing the same compiler crash. As an example (this is debug, optimizations off)

Severity	Code	Description	Project	File	Line	Suppression State
Error	MSB3722	The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -gencode=arch=compute_50,code=\"sm_50,compute_50\" --use-local-env --cl-version 2015 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -rdc=true -I./ -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\/include" -I../../common/inc -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include"  -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart shared -Xcompiler "/wd 4819" -g   -DWIN32 -DWIN32 -D_MBCS -D_MBCS -DORDER=8 -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MTd " -o x64/Debug/ "C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\test\matrixMul\"" exited with code 5. Please verify that you have sufficient rights to run this command.	matrixMul	C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations\CUDA 8.0.targets	689

Note that I started with the working MatrixMul sample. Is it possible this is nothing more than a compiler option issue?

From the output provided, there is no evidence that this is an instance of “the same compiler crash”. I do not see any message about an access violation inside PTXAS. Instead I see this:

This suggests an issue with file permissions, not an internal compiler error.

The error I always get before the one posted above follows. The output panel in VS mentions an “internal error” even before that. I tried putting my project in the same Cuda directory tree as the sample applications and at the same level. I also tried running with administrator rights. Further, at least one of my three files compiles without trouble, and there are no further dependencies, other than h and cuh files in the project, that I’m aware of.

Severity	Code	Description	Project	File	Line	Suppression State
Error		'ptxas' died with status 0xC0000005 (ACCESS_VIOLATION)	matrixMul	C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\test\matrixMul\CUDACOMPILE	1

Thanks for taking the time. I didn’t expect a response given the sparsity of details I’m providing.

As I pointed out in #2 above, internal compiler errors are nothing users can do anything about and should be reported to NVIDIA. The bug reporting form is linked from the CUDA registered developer website.

Based on my experience, there is some non-zero chance that an internal compiler error can be caused by a corrupted CUDA installation, in particular when incorrectly installing a newer version of CUDA on a machine that already has a previous version of CUDA installed.

For anyone else who happens across this post, I solved my problem by removing alignas from my code. I removed it from all of my files even though the compiler was only crashing on some of them; I was using it incorrectly in most places anyway.

In gcc, I was telling the compiler I had aligned memory using __builtin_assume_aligned. Sometimes this is necessary when the compiler can’t see how you allocated memory, and there are SIMD instructions, like AVX, that have both unaligned and aligned versions. I don’t know whether this is ever an issue for PTX. As for variables on the stack, for now I’ll assume the compiler optimally aligns them without my input.

I don’t know what “alignas” is. If the CUDA compiler cannot handle certain language constructs, it should emit an error message stating so, not terminate abnormally with a segmentation fault / access violation. I would suggest filing a bug report with NVIDIA, attaching a minimal reproducer.