Bad Cuda Card?

I’m hoping someone can help me.

I have Gentoo 3.0.0 running dual boot on my home PC with Windows 7 and I have a GT 545 card in it. My employer has a 1 RU 19" rack mount (headless machine) running Gentoo 2.6.37 and he put a GT 315 in it. Both machines are running CUDA 4.0 and a 290.xx driver.

I can query the card and read attributes like model # etc, but if I run even an extremely simple program (the simple vector addition problem in the SDK), the kernel call just returns with no error. The problem is that the kernel did nothing. It is as if the cuda kernel was a no-op. The result vector has garbage in it. This exact same code runs under Windows 7 Visual Studio and my own Gentoo 3.0.0 system here at home. I can cudaMemcpy an array of values to the card and then back again and the values are still good, so memory access from the PCIe is okay, but this doesn’t involve the nVidia/CUDA chip.

Is there anyway to programmatically tell if this card is really bad, other than swapping it? (I’m in San Jose, Ca. and the box is in L.A.)

Any Suggestions?

Happy New Year,

Check the ownership of the /dev/nvidia* files and be sure to follow the instruction in the release notes to run headless.

dev # ls -alt nvidia*
crw-rw---- 1 root video 195, 0 Dec 29 06:55 nvidia0
crw-rw---- 1 root video 195, 255 Dec 29 06:55 nvidiactl

I’m having trouble finding anything about a ‘headless’ setup and nothing in the cuda 4.0 release notes.

Can you point me at the right documentation for this?

And thank you very much for getting back so quickly.


Change the permission of the /dev/nvidia* files to 666.

This is the script in the release notes that loads the nvidia module and creates the proper /dev files with the right permissions.

  • In order to run CUDA applications, the CUDA module must be
    loaded and the entries in /dev created. This may be achieved
    by initializing X Windows, or by creating a script to load the
    kernel module and create the entries.

    An example script (to be run at boot time):


    /sbin/modprobe nvidia

    if [ “$?” -eq 0 ]; then

    Count the number of NVIDIA controllers found.

    N3D=/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l
    NVGA=/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l

    N=expr $N3D + $NVGA - 1
    for i in seq 0 $N; do
    mknod -m 666 /dev/nvidia$i c 195 $i;

    mknod -m 666 /dev/nvidiactl c 195 255

    exit 1


That didn’t seem to help. I had already changed he /dev/nvidia* to 666 because I did find this in another forum.

I ran the script that you gave me as root (because there are no other accounts anyway), but not at boot up. Would this make a difference.
Anyway, it didn’t help. The cuda kernel still seems to be a no-op.

I rebooted after running your script, and it still returns garbage.

Anything else I can try?

I really appreciate your help.


What is the output of “cat /proc/driver/nvidia/version”?
I would use the 270.41.xx driver.

NVRM version: NVIDIA UNIX x86_64 Kernel Module 290.10 Wed Nov 16 17:39:29 PST 2011
GCC version: gcc version 4.3.2 (Gentoo 4.3.2-r3 p1.6, pie-10.1.5)

I installed and tried with driver 270.41.19. Still not working. I can run the deviceQuery application and it returns everything correctly. In my extremely simple application, it returns successfully from the cuda kernel call (I used cudaDeviceSynchronize and it returned cudaSuccess). I know that deviceQuery is just using the driver to access on board EPROM or something, but how can I call a kernel and have it return cudaSuccess when it actually doesn’t do anything (at least not correctly).

Is there any other kind of testing I can to do to help narrow down this problem?

Happy New Year!


I found on a Gentoo forum that some people were running 2.6.37 (like we are) were running CUDA 3.2 and a 270 driver, so I installed these versions on the headless box.

Here is some system info:

ls -l /dev/nvidia*

crw-rw-rw- 1 root root 195, 0 Jan 1 10:37 /dev/nvidia0

crw-rw-rw- 1 root root 195, 255 Jan 1 10:37 /dev/nvidiactl


Module Size Used by

dvbm 97073 0

asi 13893 1 dvbm

nvidia 10491656 0

Is it okay that ‘nvidia’ is ‘Used by’ ZERO modules?

Output of deviceQuery

CUDA Device Query (Runtime API) version (CUDART static linking)

Found 1 CUDA Capable device(s)

Device 0: “GeForce GT 220”

CUDA Driver Version / Runtime Version 3.2 / 4.0

CUDA Capability Major/Minor version number: 1.2

Total amount of global memory: 1024 MBytes (1073545216 bytes)

( 6) Multiprocessors x ( 8) CUDA Cores/MP: 48 CUDA Cores

GPU Clock Speed: 1.36 GHz

Memory Clock rate: 400.00 Mhz

Memory Bus Width: 128-bit

Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)

Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 2147483647 bytes

Texture alignment: 256 bytes

Concurrent copy and execution: Yes with 1 copy engine(s)

Run time limit on kernels: No

Integrated GPU sharing Host Memory: No

Support host page-locked memory mapping: Yes

Concurrent kernel execution: No

Alignment requirement for Surfaces: Yes

Device has ECC support enabled: No

Device is using TCC driver mode: No

Device supports Unified Addressing (UVA): No

Device PCI Bus ID / PCI location ID: 3 / 0

Compute Mode:

 < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Version = 4.0, NumDevs = 1, Device = GeForce GT 220

The problem was that the cards I had at home were a GTS 420 and a GT 545 which are compute capability 2.0 and 2.1. The card that they installed on the machine in L.A. was a GT 315 (which is actually a GT220) and compute capability 1.2. I had my compile flags set for arch=sm_20. So when I ran my application, the driver couldn’t find code that was <= to the compute capability of the card. Embarassing!

Thanks for everybodies help.

Are you checking return codes for all CUDA calls? In the case you describe the calls won’t just fail silently, they will return proper error codes.