strange cudart.dll null pointer

I am compiling a managed C++ DLL ( as Multithreaded Debug and Release DLL, using CLR/CLI with IJW, x86 Debug ) with VS2005 Service Pack 1.

My DLL does NOT even call one CUDA routine, I am just linking it with the cudart.lib and I see in the output debug window that loads cudart.dll and cuda.dll as well as te fatZip.dll and others:

Loaded 'F:\CUDA\bin\cudart.dll', Binary was not built with debug information.

Loaded 'F:\CUDA\bin\cuda.dll', Binary was not built with debug information.

Loaded 'F:\CUDA\bin\ptxcomp.dll', Binary was not built with debug information.

Loaded 'F:\CUDA\bin\fatZip.dll', Binary was not built with debug information.

My app loads dynamically some DLL ( plugins of the program ). Once all DLL plugins where loaded ( invoking LoadLibrary and casting to some internal C++ interfaces ), it runs well 20 seconds… then a VS2005 exception message box appears randomly indicating that CUDA crashed:

Unhandled exception at 0x08a938d1 in myApp.exe: 0xC0000005: Access violation reading location 0x00000000

This is call stack:

 cudart.dll!08a938d1()  <---error here

  cudart.dll!08a9b336()  

  ntdll.dll!7c9111a7()  

  ntdll.dll!7c929213()  

  kernel32.dll!7c80c096()  

  kernel32.dll!7c80c18e()  

  crypt32.dll!77a59840()  

  crypt32.dll!77a59819()  

  crypt32.dll!77a59692()  

  ntdll.dll!7c920f46()  

  kernel32.dll!7c80b683()  

  ntdll.dll!7c920f46()  

The disassembled code:

>	cudart.dll!08a938d1()  

08A938BE  cmp         dword ptr [esp+8],3 

08A938C3  jne         08A938DF 

08A938C5  mov         eax,dword ptr ds:[08AA9EF0h] 

08A938CA  mov         ecx,dword ptr fs:[2Ch] 

08A938D1  mov         ecx,dword ptr [ecx+eax*4] <--error here, ecx=eax=0

08A938D4  add         ecx,0Ch 

08A938DA  call        08A99E76 

08A938DF  xor         eax,eax 

08A938E1  inc         eax  

08A938E2  ret         0Ch  

Of course, if I remove the dependency on cudart.lib my program runs well and does not crash. The curious is that just linking the cudart.lib WITHOUT calling a CUDA function it crashes :wacko:

I have a GeForce 6800 256Mb, ForceWare 97.73, Cuda toolkit+Cuda SDK v0.8, Windows XP x86 SP2 Spanish language, Athlon 64 3500+ mounted in an Asus A8V-deluxe AGP8X motherboard (VIAKT400) and 1Gb of Kingston HyperX DDR400

I think you have some NULL pointer there, because the ECX/EAX are both zero, so the cudart.dll is trying to read data from a NULL pointer.

as mentioned in the windows release notes:

o Manual loading of CUDA DLLs or third party DLLs using the
CUDA runtime via the LoadLibrary function is not supported.
Progam crashes result when manual loading is attempted.

The OS doesn’t initialize statically allocated TLS in a .DLL when
loaded dynamically. When DllMain in CUDART.DLL gets called
when it’s loaded, it does a NULL pointer reference.

So even w/o calling in CUDART.DLL you really are doing just that!

This will be fixed in future releases.

(and microfsoft will fix their end with VISTA)

Oh thx for the response Baarts. Sorry, I didn’t realized that was in the documentation.
Can I use the cuda driver ( not the cuda runtime ) directly while or has the same problem? I think is a DLL too ( cuda.dll ). I need to use CUDA from my DLL hehe!

Could I fix it temporally calling some cudaXXXXXXXXXX function when my .EXE initializes then?

Perhaps a good solution could be to make the cuda runtime a Windows service so is loaded when you log in?

Ah, same problem here — any ballpark figure for when this will be fixed?

I am also having the same problem. Has anyone been able to solve it ?

So, if you’re dynamically loading any .dlls, you can’t use the cudart.lib library? Is there a work-around for this?

I’m using some legacy code for loading .rgb files, but the file loader is based on a plugins that loads the appropriate dll based on the file extension. My understanding is that from the previous posts is that I just can’t do that. Is this correct?

This problem was fixed in 1.0

Thanks. I’m getting an “Access Violation” message when I try to copy a 1D array of float4 from the device to the host. I’ve reinstalled the CUDA toolkit and SDK just to make sure I’m using 1.0.

Is there anything else (including bone-headed programming mistakes) that commonly causes this error?

Here’s a bone-headed programming error that can cause the problem: Allocating with cudaMallocArray and freeing with cudaFree instead of cudaFreeArray. I fixed that problem, and can now copy the memory from device to host.