CUDA 3.2: first chance exception on load

Hi,

I upgraded to CUDA 3.2 32bit and am experiencing a problem.
An exception is thrown before the application arrives at _tmain().
if I ignore the exception the program runs on.

Using the Nsight analyzer I noticed that the exception is thrown when cuCtxAttach is called and returns a INVALID_CONTEXT error.
in my app I use cudart.lib and cufft.lib (and never the driver API) so this call looked weird to me…

the same code did not throw this exception under CUDA 3.0.
does anyone have any idea why this exception is thrown?

I’m developing in vs2008, windows 7 64bit, cuda 3.2 toolkit 32bit

the stack on the exception is:

KernelBase.dll!7683b727()
[Frames below may be incorrect and/or missing, no symbols loaded for KernelBase.dll]
KernelBase.dll!7683b727()
cudart32_32_16.dll!002b0483()
cudart32_32_16.dll!0028f5ae()
cudart32_32_16.dll!002a64c2()
cudart32_32_16.dll!002b0c6c()
cudart32_32_16.dll!002b0b5e()
cudart32_32_16.dll!00289f66()
cufft32_32_16.dll!00a16b30()
cufft32_32_16.dll!00a16bbb()
cufft32_32_16.dll!00a15803()
cufft32_32_16.dll!00a1591d()
cufft32_32_16.dll!00a159d8()
ntdll.dll!773b97a0()
ntdll.dll!773bd749()
ntdll.dll!773bde27()
ntdll.dll!773c6a3e()
ntdll.dll!773c5947()
ntdll.dll!773b9cc9()

device query:

CUDA Device Query (Runtime API) version (CUDART static linking)

There are 2 devices supporting CUDA

Device 0: “GeForce GTX 480”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 2.0
Total amount of global memory: 1576599552 bytes
Multiprocessors x Cores/MP = Cores: 15 (MP) x 32 (Cores/MP) = 480 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.40 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No

Device 1: “Quadro FX 380”
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 247005184 bytes
Multiprocessors x Cores/MP = Cores: 2 (MP) x 8 (Cores/MP) = 16 (Cores)
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.10 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.20, CUDA Runtime Version = 3.20, NumDevs = 2, Device = GeForce GTX 480, Devi
ce = Quadro FX 380

any help will be appreciated!

thanks,
eldad.

Have you recompiled everything against the new toolkit? That sort of thing sometimes comes up when you have linked with one version of the runtime library and then try running code with another.

The runtime API uses the driver API internally so seeing a failure in a driver API function doesn’t mean anything in particular, although the runtime API normally catches them so you don’t see them.

I have recompiled everything, and even made sure there were no remaining files from the previous version anywhere on the drive.

the runtime API does catch this exception - and if I stop only at unhandled exceptions - the program will run as expected.

Do you know if this exception is thrown on purpose? (as some applications use exceptions to communicate between modules - I know matlab does this…)

Naturally - if it is not thrown on purpose - I would very much like to understand why it is thrown and fix it so that it is not thrown…

do you have any other ideas regarding the source of this exception? can I provide you with any other info that can help?

thanks,

eldad.

Sorry, Windows is a platform I rarely use, so I have not had much call to go hacking into the windows versions of the driver and runtime API libraries. The only other thing I can think of is a 32 versus 64 bit problem. It could be that the 32 bit version of the 3.2 runtime library won’t run on a 64 bit host platform. It might be worth going throught the CUDA 3.1 and 3.2 windows change log and release notes with a fine toothed comb to make sure there are not any limitations there.

Thanks Avi!

edit: actually - sdk3.2 fft2d demo gives same exception…

anyone from nvidia have a suggestion?

eldad.

Hello anybody?!

are these first chance exceptions by design?

thanks,
eldad.

I have the same problem as you mentioned but on a 32-bit platform.

see here.

I’ve narrowed down the problem to the usage of textures in CUDA 3.2.

It seems that when no texture are used the exception is not thrown.

Still no answer from NVidia as to if this is by design…

eldad.

Hello everyone,

I am new to Cuda. I installed cuda toolkit and the SDK. I have windows 7 and I use visual studio 2008 express. When I tried to run the sample project “release_vc90.sln” located in “NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\src”, I got the following error: ‘C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\VCProjectDefaults\NvCudaRuntimeApi.rules’ was not found or failed to load. I was unable to find .rules files in VCProjectDefaults folder. Please help
Thanks