deviceQuery passes but other demos fail

wierzc · January 22, 2009, 7:40am

Dear Forums,

I am trying to run CUDA on a headless RHEL 5.2 server box. It has a built-in ATI video card, and I have added a Quadro FX-4600 PCI-X board for CUDA. There is no X server running (and I prefer it that way).

I have downloaded and installed the driver, toolkit, and SDK that form the CUDA 2.0 release. When I run deviceQuery, things look good:

[codebox][root@ca3-1 release]# ./deviceQuery

There is 1 device supporting CUDA

Device 0: “Quadro FX 4600”

Major revision number: 1

Minor revision number: 0

Total amount of global memory: 805044224 bytes

Number of multiprocessors: 12

Number of cores: 96

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: No

Test PASSED

Press ENTER to exit…

[/codebox]

But when I run the other examples, they either fail or segfault, e.g.:

[codebox][root@ca3-1 release]# ./eigenvalues

Using device 0: Quadro FX 4600

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

Gerschgorin interval: -2.894310 / 2.923303

Average time step 1: 224.800690 ms

Average time step 2, one intervals: 224.823013 ms

Average time step 2, mult intervals: 112.424240 ms

Average time TOTAL: 786.905884 ms

Segmentation fault

[root@ca3-1 release]# ./BlackScholes

Using device 0: Quadro FX 4600

Initializing data…

…allocating CPU memory for options.

…allocating GPU memory for options.

…generating input data in CPU mem.

…copying input data to GPU mem.

Data init done.

Executing Black-Scholes GPU kernel (512 iterations)…

Options count : 8000000

BlackScholesGPU() time : 111.513420 msec

Effective memory bandwidth: 0.717402 GB/s

Gigaoptions per second : 0.071740

Reading back GPU results…

Checking the results…

…running CPU calculations.

Comparing the results…

L1 norm: 1.000000E+00

Max absolute error: 9.574021E+01

TEST FAILED

Shutting down…

…releasing GPU memory.

…releasing CPU memory.

Shutdown done.

Press ENTER to exit…

[/codebox]

One thing that caught my eye is that if I run glxinfo, it looks like the ATI OpenGL is running. Shouldn’t it be NVIDIA?

[codebox][root@ca3-1 release]# glxinfo | egrep -e ‘(client|server|OpenGL)’

server glx vendor string: SGI

server glx version string: 1.2

server glx extensions:

client glx vendor string: NVIDIA Corporation

client glx version string: 1.4

client glx extensions:

OpenGL vendor string: ATI Technologies Inc.

OpenGL renderer string: ATI Radeon 9200 OpenGL Engine

OpenGL version string: 1.3 ATI-1.5.36

OpenGL extensions:[/codebox]

Any help debugging from here would be hugely appreciated.

netllama · January 22, 2009, 2:45pm

Does this reproduce with the CUDA_2.1 release?

Please generate and attach an nvidia-bug-report.log.

wierzc · January 22, 2009, 6:31pm

Hi ! Thanks for your fast reply.

I tried CUDA 2.1. Again, deviceQuery works but the other demos fail. This time, they fail in different ways from before:

[codebox][root@ca3-1 release]# ./deviceQuery

There is 1 device supporting CUDA

Device 0: “Quadro FX 4600”

Major revision number: 1

Minor revision number: 0

Total amount of global memory: 805044224 bytes

Number of multiprocessors: 12

Number of cores: 96

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: No

Test PASSED

Press ENTER to exit…

[root@ca3-1 release]# ./BlackScholes

Initializing data…

…allocating CPU memory for options.

…allocating GPU memory for options.

cudaSafeCall() Runtime API error in file <BlackScholes.cu>, line 155 : unspecified launch failure.

[root@ca3-1 release]# echo $LD_LIBRARY_PATH

/usr/local/cuda/lib

[root@ca3-1 release]# ./bandwidthTest

Running on…

  device 0:Quadro FX 4600

Quick Mode

Host to Device Bandwidth for Pageable memory

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : unspecified launch failure.

[root@ca3-1 release]# ./eigenvalues

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

cudaSafeCall() Runtime API error in file <main.cu>, line 125 : unspecified launch failure.

[/codebox]

I am attaching a bug report.

Many thanks!

Casimir

[attachment=8356:nvidia_b…port.log.txt]
nvidia_bug_report.log.txt (181 KB)

netllama · January 22, 2009, 6:37pm

Hi ! Thanks for your fast reply.

I tried CUDA 2.1. Again, deviceQuery works but the other demos fail. This time, they fail in different ways from before:

[codebox][root@ca3-1 release]# ./deviceQuery

There is 1 device supporting CUDA

Device 0: “Quadro FX 4600”

Major revision number: 1

Minor revision number: 0

Total amount of global memory: 805044224 bytes

Number of multiprocessors: 12

Number of cores: 96

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: No

Test PASSED

Press ENTER to exit…

[root@ca3-1 release]# ./BlackScholes

Initializing data…

…allocating CPU memory for options.

…allocating GPU memory for options.

cudaSafeCall() Runtime API error in file <BlackScholes.cu>, line 155 : unspecified launch failure.

[root@ca3-1 release]# echo $LD_LIBRARY_PATH

/usr/local/cuda/lib

[root@ca3-1 release]# ./bandwidthTest

Running on…
  device 0:Quadro FX 4600
Quick Mode

Host to Device Bandwidth for Pageable memory

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : unspecified launch failure.

[root@ca3-1 release]# ./eigenvalues

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

cudaSafeCall() Runtime API error in file <main.cu>, line 125 : unspecified launch failure.

[/codebox]

I am attaching a bug report.

Many thanks!

Casimir

[attachment=10830:nvidia_b…port.log.txt]

180.06 is not the final CUDA_2.1 driver. Based on your bug report, it looks like you’re hitting a motherboard chipset bug which is worked around in the CUDA_2.1 final driver.

wierzc · January 22, 2009, 7:11pm

Thanks for your reply.

So the answer is that I should just wait for the final CUDA 2.1 driver to be released?

E.D_Riedijk · January 22, 2009, 8:21pm

No you should just download the final CUDA 2.1 driver ;)

http://forums.nvidia.com/index.php?showforum=63

wierzc · January 22, 2009, 11:52pm

Thanks for the pointer! I found the final drivers at http://forums.nvidia.com/index.php?showtopic=85832

I couldn’t find them before because I was looking on the main CUDA 2.1 download page at http://www.nvidia.com/object/cuda_get.html

Anyway, I installed them and I get the same problem as before: deviceQuery works but the other demos throw errors when they hit cudaMalloc commands:

[codebox][root@ca3-1 release]# ./deviceQuery

There is 1 device supporting CUDA

Device 0: “Quadro FX 4600”

Major revision number: 1

Minor revision number: 0

Total amount of global memory: 805044224 bytes

Number of multiprocessors: 12

Number of cores: 96

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: No

Test PASSED

Press ENTER to exit…

[root@ca3-1 release]# ./bandwidthTest

Running on…

  device 0:Quadro FX 4600

Quick Mode

Host to Device Bandwidth for Pageable memory

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : unspecified launch failure.

[root@ca3-1 release]# ./eigenvalues

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

cudaSafeCall() Runtime API error in file <main.cu>, line 125 : unspecified launch failure.

[root@ca3-1 release]# ./BlackScholes

Initializing data…

…allocating CPU memory for options.

…allocating GPU memory for options.

cudaSafeCall() Runtime API error in file <BlackScholes.cu>, line 155 : unspecified launch failure.[/codebox]

I have attached a new bug report file.

Thanks very much for your help!

Cas
nvidia_bug_report.log.txt (203 KB)

netllama · January 22, 2009, 11:55pm

Thanks for the pointer! I found the final drivers at http://forums.nvidia.com/index.php?showtopic=85832

I couldn’t find them before because I was looking on the main CUDA 2.1 download page at http://www.nvidia.com/object/cuda_get.html

Anyway, I installed them and I get the same problem as before: deviceQuery works but the other demos throw errors when they hit cudaMalloc commands:

[codebox][root@ca3-1 release]# ./deviceQuery

There is 1 device supporting CUDA

Device 0: “Quadro FX 4600”

Major revision number: 1

Minor revision number: 0

Total amount of global memory: 805044224 bytes

Number of multiprocessors: 12

Number of cores: 96

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: No

Test PASSED

Press ENTER to exit…

[root@ca3-1 release]# ./bandwidthTest

Running on…
  device 0:Quadro FX 4600
Quick Mode

Host to Device Bandwidth for Pageable memory

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : unspecified launch failure.

[root@ca3-1 release]# ./eigenvalues

Matrix size: 2048 x 2048

Precision: 0.000010

Iterations to be timed: 100

Result filename: ‘eigenvalues.dat’

cudaSafeCall() Runtime API error in file <main.cu>, line 125 : unspecified launch failure.

[root@ca3-1 release]# ./BlackScholes

Initializing data…

…allocating CPU memory for options.

…allocating GPU memory for options.

cudaSafeCall() Runtime API error in file <BlackScholes.cu>, line 155 : unspecified launch failure.[/codebox]

I have attached a new bug report file.

Thanks very much for your help!

Cas

According to this bug report, X failed to start:

#########

(II) NVIDIA(0): Initialized GPU GART.

(EE) NVIDIA(0): Error recovery failed.

(EE) NVIDIA(0): *** Aborting ***

(II) NVIDIA(0): Setting mode “nvidia-auto-select”

(EE) NVIDIA(0): WAIT: (E, 0, 0x507d, 0)

###########

Your system appears to have more fundamental problems than just CUDA functionality. According to Tyan’s website, your motherboard does not support PCI-E graphics cards:

http://tyan.com/product_board_detail.aspx?pid=235

Topic		Replies	Views
deviceQuery OK, everything else hangs Cuda sdk 4.1 examples simply hang, no errors, no warnings CUDA Programming and Performance	12	8998	April 23, 2012
There is no device supporting CUDA CUDA Programming and Performance	5	3756	October 12, 2010
Cannot run any CUDA kernels CUDA runtime doesn't recognize NVIDIA GPU CUDA Programming and Performance	26	12848	August 24, 2010
CUDA 2.1 discussion CUDA Programming and Performance	71	64450	February 17, 2009
There is no device supporting CUDA CUDA Programming and Performance	11	22857	April 24, 2008
CUDA 4 + driver 270.35 (C2050) random errors CUDA Programming and Performance	13	18782	April 7, 2011
why "all CUDA-capable devices are busy or unavailable" ? CUDA Programming and Performance	34	65011	April 20, 2011
SDK sample code failures only on samples that launch a kernel CUDA Programming and Performance	17	8828	January 7, 2009
CUDA 3.2 on GTX 480 is "busy or unavailable" CUDA Programming and Performance	19	73620	March 21, 2011
CUDA, Linux Ubuntu 10.04 and strange mismatch version CUDA Programming and Performance	26	19426	November 18, 2010

deviceQuery passes but other demos fail

Related topics