permanent CUDAFE crashes due to 0x0 memory reference

Hi dear experts,
anytime I try to compile a cuda (*.cu) file, I get exactly the same error with cudafe.
Here is the complete toolchain generated with "nvcc -v … "

nvcc -gencode=arch=compute_20,code=“sm_20,compute_20” -v -foreign -D_WIN32 -I"J:\src\lib\Microsoft SDKs\Windows\v6.0\include" -I"J:\src\lib\GPU Computing SDK\shared\inc" -I"J:/src/lib/GPU Computing SDK/C/common/inc" -I"C:\Programme\Microsoft Visual Studio 10.0\VC\include" -I"J:\src\opencv\sources" -I"J:\src\opencv\sources\include" -I"J:\src\opencv\sources\modules\core\include" -I"J:\src\opencv\sources\modules\gpu\include" -I"J:\src\lib\NPP\common\npp\include" -I"J:\src\lib\NPP\common\FreeImage\include" -I"J:\src\opencv\sources\modules\gpu\src\nvidia\core" -I"J:\src\opencv\sources\modules\gpu\src\nvidia\NPP_staging" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -maxrregcount=32 -gencode=arch=compute_20,code=“sm_20,compute_20” -c boxFilter_kernel.cu

# _SPACE_= # MODE=DEVICE
# _CUDART_=cudart # HERE=J:\src\lib\cuda30\bin # _THERE_=J:\src\lib\cuda30\bin # TARGET_SIZE=
# TOP=J:\src\lib\cuda30\bin/.. # PATH=J:\src\lib\cuda30\bin/…/extools/bin;J:\src\lib\cuda30\bin/…/open64/bin ;J:\src\lib\cuda30\bin/…/bin;J:\src\lib\cuda30\bin/…/lib;c:\windows\system32;J :\src\mpeg2\dgindex158src\Release;D:\vs\Common\MSDev98\Bin;d:\programme\MPlayer; D:\Programme\ffmpeg\bin;D:\Programme\ffmpeg;d:\programme\tcc;d:\programme\resour ce hacker;C:\Programme\TortoiseSVN\bin;d:\programme\wget;d:\Programme\Qemu-0.8;d :\Programme\virtualdub;c:\programme\winrar;d:\programme\xz;C:\MinGW\msys\1.0;J: \archives\Train Simulator\UTILS;D:\Programme\upx307w;D:\Programme\SCC_Tools;J:\s rc\codeblocks\MinGW\bin;C:\Programme\Git\bin;D:\Programme\dvd2avi;c:\masm32\bin; C:\Programme\python323;C:\Programme\PDFtk\bin;C:\Programme\debuggers;D:\Program me;J:\src\opencv\build\bin\Release;J:\src\lib\cuda30\bin;J:\src\lib\cuda30\open6 4\bin;C:\Programme\Microsoft Visual Studio 10.0\common7\ide;D:\Programme\cuda-wa ste;D:\Programme\mpg2cut2;C:\Programme\7-Zip;J:\src\lib\GPU Computing SDK\C\bin\ win32\EmuRelease;C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319;C:\Programme\CMak e\bin;C:\Programme\Microsoft Visual Studio 10.0\VC\bin;C:\Programme\Microsoft Vi sual Studio 10.0\common7\ide
# INCLUDES="-IJ:\src\lib\cuda30\bin/../include" "-IJ:\src\lib\cuda30\bin/../inc lude/cudart" # LIBRARIES= “/LIBPATH:J:\src\lib\cuda30\bin/…/lib” cudart.lib
# CUDAFE_FLAGS= # OPENCC_FLAGS=
# PTXAS_FLAGS= #

cl -D__CUDA_ARCH__=100 -E -TP -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_11_ATOMIC_INTRINSICS -DCUDA_NO_SM_13_DOUBLE_INTRINSICS “-IJ:\src\lib\cuda30\bin/…/include” “-IJ:\src\lib\cuda30\bin/…/include/cudart” -I. -D__CUDACC__ -C -I “J:/src/lib/Microsoft SDKs/Windows/v6.0/ include” -I “J:/src/lib/GPU Computing SDK/shared/inc” -I “J:/src/lib/GPU Computing SDK/C/common/inc” -I “C:/Programme/Microsoft Visual Studio 10.0/VC/include” -D “_WIN32” -FI “cuda_runtime.h” > “C:/DOKUME~1/ich1/LOKALE~1/Temp/tmpxft_0000023 8_00000000-3_boxFilter_kernel.cpp1.ii” “boxFilter_kernel.cu”

cudafe --m32 --diag_error=host_device_limited_call --diag_error=ms_asm_decl_not_allowed -tused --no_remove_unneeded_entities --gen_c_file_name “C:/DOKUME~1/ich1/LOKALE~1/Temp/tmpxft_00000238_00000000-0_boxFilter_kernel.cudafe1.c” --stub_file_name “C:/DOKUME~1/ich1/LOKALE~1/Temp/tmpxft_00000238_00000000-0_boxFilter_kernel.cudafe1.stub.c” --stub_header_file_name “C:/DOKUME~1/ich1/LOKALE~1/Temp/tmpxft_00000238_00000000-0_boxFilter_kernel.cudafe1.stub.h” --gen_device_file_name “C:/DOKUME~1/ich1/LOKALE~1/Temp/tmpxft_00000238_00000000-0_boxFilter_kernel.cudafe1.gpu” --include_file_name “C:/DOKUME~1/ich1/LOKALE~1/Temp/tmpxft_00000238_00000000-2_boxFilter_kernel.fatbin.c” “C:/DOKUME~1/ich1/LOKALE~1/Temp/tmpxft_00000238_00000000-3_boxFilter_kernel.cpp1.ii”

nvcc error : ‘cudafe’ died with status 0xC0000005 (ACCESS_VIOLATION) # --error 0xc0000005 –


addendum:
cudafe machine instruction rva 00044c3d8 => 0x00000000 (read) v3.0
(eax is filled with uninitialized address: the NULL pointer!)

Nearly ANY hint will do, cause this error is - until now - unsolvable for me!
Thank you in advance…

An access violation (also refered to as a segfault on other platforms) in a CUDA compiler component usually means there is a bug (uninitialized data, out of bounds memory access) in the compiler itself, which should be reported to NVIDIA together with a minimal program that reproduces the issue reliably.

However, such bugs are usually crop up only with specific code idioms that trigger the latent bug (and weren’t hit when the tool chain was tested by NVIDIA). Since you say this error occurs every time regardless of which .cu source file you compile, that hypothesis seems unlikely here. A more likely hypothesis is that the CUDA installation is corrupted, or incomplete.

What version of CUDA are you using? Did you recently upgrade from an older version of CUDA?

Hello njuffa,

thank you very much for the fast answer.
I try to use cuda 3.0 now because of the emulation capability with all include and lib paths set to 3.0 dirs.
Before 3.0 I installed indeed version 6.0 of the toolkit which is still on the machine but nothing points to it.
Xtremely wondering is the fact that nvcc produced ok *.obj files from *.cu sometime ago (with ok linked and executable exes at the end) before the cudafe problem arised suddenly somehow … from my view there happened no cuda-relevant change in between (???)

If the CUDA toolchain suddenly stopped working without you having installed a new version or making changes to the configuration, there was possibly some other event that led to a corruption of the compiler executables. There may also be a conflict between the CUDA 3.0 and CUDA 6.0 files somehow. Impossible to diagnose remotely what that could be.

Even assuming the problem is due to the tool chain components itself, it would make no sense to report issues with version 3.0 that is four (or five?) years old to NVIDIA. I would suggest switching to the latest stable version, which is CUDA 6.5. Or you may want to try the CUDA 7.0 release candidate now available to registered developers. Please note that CUDA 6.5 and CUDA 7.0 removed support for some older GPUs, so this may not be an option if you are using old hardware.

The device emulation in old CUDA versions was a workaround for the lack of a proper debugger at that time, and it was an approach that had lots of shortcomings. Nowadays CUDA includes very good debugger support so I would suggest using that.

After removal of versions 3.0 and 6.0 of the cuda toolkit, installation of the current v6.5 yielded exactly the same result as above…
Furthermore installation of cuda on a plain clean virtual machine under vmware still leads to the access violation by cudafe!

So this has to be seen as a (severe) bug cause there’s nothing known about the exact environment this tool might require.

Other facts:

  • bug obviously does not show up under all circumstances (on most machines cudafe seems to function…)
  • violation only occurs when any kind of input file is given and only after this one has been parsed
  • this is not a single case, other cudafe crashes under more special conditions have been described
  • on other systems ‘nvcc -v …’-log reveals that nvcc might produce a totally different toolchain

It would be a help when someone knew a buglist/bug-tracking site for cudafe where I could classify/compare “my” error and also make a bugreport.

Thanx in advance!

I don’t know of any site that publishes bug info. You can file bugs at developer.nvidia.com, after registering. It’s unlikely to make much progress unless you can provide a detailed set of instructions that would allow someone else to reproduce the error. That would include a complete description of the environment as well as a complete sample code and the compile command used. I don’t see that level of description in this thread, for example. just providing the output from nvcc -v … for example is not enough.

You can submit a bug report to NVIDIA regarding the issues with CUDA 6.5 by using the bug reporting form linked from the registered developer website. You would want to attach a minimal self-contained code to the report that reliably triggers the error, together with precise information about the compilation environment.

nvcc is just a driver program, it invokes the different components of the tool chain in sequence to produce object files and executables. Depending on platform, build target, and CUDA version, different components may be invoked. However, best I know cudafe is always part of the sequence of executables invoked.

[Later:] The more I think about this, the more puzzled I get. From the information so far it seems that there could be an issue in cudafe that has gone undetected for years while tens of thousands of programmers have used the compiler, yet one user hits the problem on multiple different pieces of code. Based on that I hypothesize that the issue may have nothing to do with the code being compiled, but is triggered by an unusual artifact in the environment, such as a particular non-ASCII (Unicode) character in a file name or path. Scanning the environment data displayed above, the only unusual item I spotted is an erroneous white space in one of the parts of PATH:

C:\Programme\Microsoft Vi sual Studio 10.0\common7\ide