SDK 4.0 Windows debug build crash

All of a sudden my cudafe++.exe would crash doing a debug build (release build would build fine).

CUDACOMPILE : nvcc error : ‘cudafe++’ died with status 0xC0000005 (ACCESS_VIOLATION)
C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 4.0.targets(352,9): error MSB3721: The command ““C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\nvcc.exe” -gencode=arch=compute_13,code="sm_13,compute_13" --use-local-env --cl-version 2010 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include” --opencc-options -LIST:source=on -G0 --keep --keep-dir “C:\Users\john\Documents\Visual Studio 2008\Projects\resdp1\Debug” -maxrregcount=0 --machine 32 --compile -D_NEXUS_DEBUG -g -Xcompiler “/EHsc /nologo /Od /Zi /RTC1 /MDd " -o “Debug\main.cu.obj” “C:\Users\john\Documents\Visual Studio 2008\Projects\resdp1\main.cu”” exited with code -1073741819.

I did track this problem down. All I needed to crash the compiler was to add a global:

texture< float, cudaTextureType1D, cudaReadModeElementType > d_t32QPenalty;

The crash then seems to occur during link to the debug-version MT runtime library (cudafe++.exe will NOT crash if I change the debug build to use the release-version MT runtime library, which I’m now building with for debug).

I think my build setup is square-on (I’m using $(CudaToolkitIncludeDir) and $(CudaToolkitLibDir)), but anyone with any ideas? clean rebuild never helped, nor did pch removal.

Perhaps related, I can’t seem to get “texture<” to not be redlined and on mouse-hover it says “texture not a template”. Help! I’m slowly running out of intellisense fu!

Interestingly, when I have:

texture< unsigned char, cudaTextureType1D, cudaReadModeElementType > d_t8QPenalty;

Hostside I then link my PTX as a string resource, load it with findresource fu, and call:

CF_CHECK_CALL( rcCUDA, cuModuleGetTexRef( &hDeviceTextureUnit, hModule, szName ) );
CF_CHECK_CALL( rcCUDA, cuTexRefGetFormat( &aChannelFormat, &iChannels, hDeviceTextureUnit ) );

But here I notice that aChannelFormat is CU_AD_FORMAT_FLOAT. Huh? I see that the hostside ctor was called and initialized the struct ok but on the device my kernel returns bogus values. If I suitably doctor up an CUDA_ARRAY_DESCRIPTOR with CU_AD_FORMAT_UNSIGNED_INT8 and call:

CF_CHECK_CALL( rcCUDA, cuArrayCreate( &pDeviceMem, &ad ) );

My kernel returns the proper uint8 texture.

So, my textures work for float but not for uint8. It looks like my host-side static ctors are working but stuff isn’t getting sent over to the device.

Any ideas for what to try or change?

I’ve resolved the problem for now with host-side config of the array_format…

CUDA_ARRAY_DESCRIPTOR ad = { u32ASize, 1, (CUarray_format)CU_AD_FORMAT_UNSIGNED_INT8, 1 };

CF_CHECK_CALL( rcCUDA, cuArrayCreate( &pDeviceArrayMem, &ad ) );

CF_CHECK_CALL( rcCUDA, cuTexRefSetArray( hDeviceTextureUnit, pDeviceArrayMem, CU_TRSA_OVERRIDE_FORMAT ) );

I’m quite convinced that there is some magic static initialization that happens to texturerefs that is not completely apparent or represented by the PTX (I’m loading the PTX as a text resource for execution). Perhaps one day I’ll figure out this little problem and provide an update.