gpu computing sdk 4.0 runtime failures build the sdk succesfully, but the run of any exe failed

I installed the driver, the cuda 4.0, the tool kit 4.0, computing SDK 4.0 and Nsight 2.0. All installed successfully. bandwidthTest was built successfully with VS 2010. Then the run crashed with the following VS log:

[font=“Consolas”][font=“Consolas”]bandwidthTest.exe’: Loaded ‘C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.0\C\bin\win64\Debug\bandwidthTest.exe’, Symbols loaded.

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\ntdll.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\kernel32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\KernelBase.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\bin\cudart64_40_17.dll’, Binary was not built with debug information.

‘bandwidthTest.exe’: Loaded ‘C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.0\C\bin\win64\Debug\cutil64D.dll’, Symbols loaded.

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\msvcp100d.dll’, Symbols loaded.

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\msvcr100d.dll’, Symbols loaded.

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\nvcuda.dll’, Binary was not built with debug information.

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\user32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\gdi32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\lpk.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\usp10.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\msvcrt.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\setupapi.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\cfgmgr32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\rpcrt4.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\advapi32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\sechost.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\oleaut32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\ole32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\devobj.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\imm32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\msctf.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\dwmapi.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Unloaded ‘C:\Windows\System32\dwmapi.dll’

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\nvapi64.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\shlwapi.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\shell32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\version.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\wintrust.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\crypt32.dll’, Cannot find or open the PDB file

‘bandwidthTest.exe’: Loaded ‘C:\Windows\System32\msasn1.dll’, Cannot find or open the PDB file

First-chance exception at 0x0066acb6 in bandwidthTest.exe: 0xC0000005: Access violation reading location 0xffffffffffffffff.

Unhandled exception at 0x0066acb6 in bandwidthTest.exe: 0xC0000005: Access violation reading location 0xffffffffffffffff.

The program ‘[4344] bandwidthTest.exe: Native’ has exited with code -1073741819 (0xc0000005).

VS pointed to the following line of code for error at the point of crash:

[font=“Consolas”][font=“Consolas”]float testHostToDeviceTransfer( …)

{

…[/font][/font][font=“Consolas”][font=“Consolas”]cutilCheckError( cutCreateTimer( &timer ) );

cutilSafeCall( cudaEventCreate( &start ) );

[/font][/font]}

My system is: Windows 7 64 bit, with 2 M2070 GPU installed.

Please advise.

[/font][/font]

Did you build all the common libraries that get installed with the SDK? Go to the SDK installation dir\C\common and open the VS2010 project in this dir. Build the project and then try to run the original code you wanted to run. It may help. CUDA 4.0 installation on Windows & integration with VS2010 is very erratic.

Also did you try to run the deviceQuery example? Is that working correctly and showing SDK and CUDA runtime installed correctly?

Thanks for the reply. The deviceQuery example was able to run successfully with the following output included in the end. The cutil and shrUtils projects were built along with the bandwidthTest. Still the bandwidthTest had the same runtime error. In fact, even the cppIntegration project showed an error as well. I tried to use VS 2005 to build, the project showed the same error. I tried to completely re-install the sdk, the driver, the toolkit, NVSight, all did not help.

Please advise further.

Thanks,

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 2 CUDA Capable device(s)

Device 0: “Tesla M2070”

CUDA Driver Version / Runtime Version 4.0 / 4.0

CUDA Capability Major/Minor version number: 2.0

Total amount of global memory: 5376 MBytes (5636816896 bytes)

(14) Multiprocessors x (32) CUDA Cores/MP: 448 CUDA Cores

GPU Clock Speed: 1.15 GHz

Memory Clock rate: 1566.00 Mhz

Memory Bus Width: 384-bit

L2 Cache Size: 786432 bytes

Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3

D=(2048,2048,2048)

Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16

  1. x 2048

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Concurrent copy and execution: Yes with 2 copy engine(s)

Run time limit on kernels: No

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: Yes

Alignment requirement for Surfaces: Yes

Device has ECC support enabled: Yes

Device is using TCC driver mode: Yes

Device supports Unified Addressing (UVA): Yes

Device PCI Bus ID / PCI location ID: 2 / 0

Compute Mode:

< Default (multiple host threads can use ::cudaSetDevice() with device simu

ltaneously) >

Device 1: “Tesla M2070”

CUDA Driver Version / Runtime Version 4.0 / 4.0

CUDA Capability Major/Minor version number: 2.0

Total amount of global memory: 5376 MBytes (5636816896 bytes)

(14) Multiprocessors x (32) CUDA Cores/MP: 448 CUDA Cores

GPU Clock Speed: 1.15 GHz

Memory Clock rate: 1566.00 Mhz

Memory Bus Width: 384-bit

L2 Cache Size: 786432 bytes

Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3

D=(2048,2048,2048)

Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16

  1. x 2048

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes

Total number of registers available per block: 32768

Warp size: 32

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes

Texture alignment: 512 bytes

Concurrent copy and execution: Yes with 2 copy engine(s)

Run time limit on kernels: No

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: Yes

Alignment requirement for Surfaces: Yes

Device has ECC support enabled: Yes

Device is using TCC driver mode: Yes

Device supports Unified Addressing (UVA): Yes

Device PCI Bus ID / PCI location ID: 3 / 0

Compute Mode:

< Default (multiple host threads can use ::cudaSetDevice() with device simu

ltaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Versi

on = 4.0, NumDevs = 2, Device = Tesla M2070, Device = Tesla M2070

[deviceQuery.exe] test results…

PASSED

Press to exit…

The problem went away after I power recycled the machine.

Thanks a lot for the reply.