deviceQuery passes but other demos fail

Dear Forums,

I am trying to run CUDA on a headless RHEL 5.2 server box. It has a built-in ATI video card, and I have added a Quadro FX-4600 PCI-X board for CUDA. There is no X server running (and I prefer it that way).

I have downloaded and installed the driver, toolkit, and SDK that form the CUDA 2.0 release. When I run deviceQuery, things look good:

[codebox][root@ca3-1 release]# ./deviceQuery

There is 1 device supporting CUDA

Device 0: “Quadro FX 4600”

Major revision number: 1

Minor revision number: 0

Total amount of global memory: 805044224 bytes

Number of multiprocessors: 12

Number of cores: 96

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: No

Test PASSED

Press ENTER to exit…

[/codebox]

But when I run the other examples, they either fail or segfault, e.g.:

[codebox][root@ca3-1 release]# ./eigenvalues

Using device 0: Quadro FX 4600

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

Gerschgorin interval: -2.894310 / 2.923303

Average time step 1: 224.800690 ms

Average time step 2, one intervals: 224.823013 ms

Average time step 2, mult intervals: 112.424240 ms

Average time TOTAL: 786.905884 ms

Segmentation fault

[root@ca3-1 release]# ./BlackScholes

Using device 0: Quadro FX 4600

Initializing data…

…allocating CPU memory for options.

…allocating GPU memory for options.

…generating input data in CPU mem.

…copying input data to GPU mem.

Data init done.

Executing Black-Scholes GPU kernel (512 iterations)…

Options count : 8000000

BlackScholesGPU() time : 111.513420 msec

Effective memory bandwidth: 0.717402 GB/s

Gigaoptions per second : 0.071740

Reading back GPU results…

Checking the results…

…running CPU calculations.

Comparing the results…

L1 norm: 1.000000E+00

Max absolute error: 9.574021E+01

TEST FAILED

Shutting down…

…releasing GPU memory.

…releasing CPU memory.

Shutdown done.

Press ENTER to exit…

[/codebox]

One thing that caught my eye is that if I run glxinfo, it looks like the ATI OpenGL is running. Shouldn’t it be NVIDIA?

[codebox][root@ca3-1 release]# glxinfo | egrep -e ‘(client|server|OpenGL)’

server glx vendor string: SGI

server glx version string: 1.2

server glx extensions:

client glx vendor string: NVIDIA Corporation

client glx version string: 1.4

client glx extensions:

OpenGL vendor string: ATI Technologies Inc.

OpenGL renderer string: ATI Radeon 9200 OpenGL Engine

OpenGL version string: 1.3 ATI-1.5.36

OpenGL extensions:[/codebox]

Any help debugging from here would be hugely appreciated.

Does this reproduce with the CUDA_2.1 release?

Please generate and attach an nvidia-bug-report.log.

Hi ! Thanks for your fast reply.

I tried CUDA 2.1. Again, deviceQuery works but the other demos fail. This time, they fail in different ways from before:

[codebox][root@ca3-1 release]# ./deviceQuery

There is 1 device supporting CUDA

Device 0: “Quadro FX 4600”

Major revision number: 1

Minor revision number: 0

Total amount of global memory: 805044224 bytes

Number of multiprocessors: 12

Number of cores: 96

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: No

Test PASSED

Press ENTER to exit…

[root@ca3-1 release]# ./BlackScholes

Initializing data…

…allocating CPU memory for options.

…allocating GPU memory for options.

cudaSafeCall() Runtime API error in file <BlackScholes.cu>, line 155 : unspecified launch failure.

[root@ca3-1 release]# echo $LD_LIBRARY_PATH

/usr/local/cuda/lib

[root@ca3-1 release]# ./bandwidthTest

Running on…

  device 0:Quadro FX 4600

Quick Mode

Host to Device Bandwidth for Pageable memory

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : unspecified launch failure.

[root@ca3-1 release]# ./eigenvalues

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

cudaSafeCall() Runtime API error in file <main.cu>, line 125 : unspecified launch failure.

[/codebox]

I am attaching a bug report.

Many thanks!

Casimir

[attachment=8356:nvidia_b…port.log.txt]
nvidia_bug_report.log.txt (181 KB)

180.06 is not the final CUDA_2.1 driver. Based on your bug report, it looks like you’re hitting a motherboard chipset bug which is worked around in the CUDA_2.1 final driver.

Thanks for your reply.

So the answer is that I should just wait for the final CUDA 2.1 driver to be released?

No you should just download the final CUDA 2.1 driver ;)

http://forums.nvidia.com/index.php?showforum=63

Thanks for the pointer! I found the final drivers at http://forums.nvidia.com/index.php?showtopic=85832

I couldn’t find them before because I was looking on the main CUDA 2.1 download page at http://www.nvidia.com/object/cuda_get.html

Anyway, I installed them and I get the same problem as before: deviceQuery works but the other demos throw errors when they hit cudaMalloc commands:

[codebox][root@ca3-1 release]# ./deviceQuery

There is 1 device supporting CUDA

Device 0: “Quadro FX 4600”

Major revision number: 1

Minor revision number: 0

Total amount of global memory: 805044224 bytes

Number of multiprocessors: 12

Number of cores: 96

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: No

Test PASSED

Press ENTER to exit…

[root@ca3-1 release]# ./bandwidthTest

Running on…

  device 0:Quadro FX 4600

Quick Mode

Host to Device Bandwidth for Pageable memory

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : unspecified launch failure.

[root@ca3-1 release]# ./eigenvalues

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

cudaSafeCall() Runtime API error in file <main.cu>, line 125 : unspecified launch failure.

[root@ca3-1 release]# ./BlackScholes

Initializing data…

…allocating CPU memory for options.

…allocating GPU memory for options.

cudaSafeCall() Runtime API error in file <BlackScholes.cu>, line 155 : unspecified launch failure.[/codebox]

I have attached a new bug report file.

Thanks very much for your help!

Cas
nvidia_bug_report.log.txt (203 KB)

According to this bug report, X failed to start:

#########

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Initialized GPU GART.

(EE) NVIDIA(0): Error recovery failed.

(EE) NVIDIA(0): *** Aborting ***

(II) NVIDIA(0): Setting mode “nvidia-auto-select”

(EE) NVIDIA(0): WAIT: (E, 0, 0x507d, 0)

###########

Your system appears to have more fundamental problems than just CUDA functionality. According to Tyan’s website, your motherboard does not support PCI-E graphics cards:

http://tyan.com/product_board_detail.aspx?pid=235