I have Gentoo 3.0.0 running dual boot on my home PC with Windows 7 and I have a GT 545 card in it. My employer has a 1 RU 19" rack mount (headless machine) running Gentoo 2.6.37 and he put a GT 315 in it. Both machines are running CUDA 4.0 and a 290.xx driver.
I can query the card and read attributes like model # etc, but if I run even an extremely simple program (the simple vector addition problem in the SDK), the kernel call just returns with no error. The problem is that the kernel did nothing. It is as if the cuda kernel was a no-op. The result vector has garbage in it. This exact same code runs under Windows 7 Visual Studio and my own Gentoo 3.0.0 system here at home. I can cudaMemcpy an array of values to the card and then back again and the values are still good, so memory access from the PCIe is okay, but this doesn’t involve the nVidia/CUDA chip.
Is there anyway to programmatically tell if this card is really bad, other than swapping it? (I’m in San Jose, Ca. and the box is in L.A.)
Change the permission of the /dev/nvidia* files to 666.
This is the script in the release notes that loads the nvidia module and creates the proper /dev files with the right permissions.
In order to run CUDA applications, the CUDA module must be
loaded and the entries in /dev created. This may be achieved
by initializing X Windows, or by creating a script to load the
kernel module and create the entries.
That didn’t seem to help. I had already changed he /dev/nvidia* to 666 because I did find this in another forum.
I ran the script that you gave me as root (because there are no other accounts anyway), but not at boot up. Would this make a difference.
Anyway, it didn’t help. The cuda kernel still seems to be a no-op.
I rebooted after running your script, and it still returns garbage.
I installed and tried with driver 270.41.19. Still not working. I can run the deviceQuery application and it returns everything correctly. In my extremely simple application, it returns successfully from the cuda kernel call (I used cudaDeviceSynchronize and it returned cudaSuccess). I know that deviceQuery is just using the driver to access on board EPROM or something, but how can I call a kernel and have it return cudaSuccess when it actually doesn’t do anything (at least not correctly).
Is there any other kind of testing I can to do to help narrow down this problem?
The problem was that the cards I had at home were a GTS 420 and a GT 545 which are compute capability 2.0 and 2.1. The card that they installed on the machine in L.A. was a GT 315 (which is actually a GT220) and compute capability 1.2. I had my compile flags set for arch=sm_20. So when I ran my application, the driver couldn’t find code that was <= to the compute capability of the card. Embarassing!