CUDA 2.2 + module 185.18.08 queries affirmatively but fails to work

In short, deviceQuery tells me “There is 1 device supporting CUDA” but matrixMul says “no CUDA-capable device is available.”

The CUDA SDK binaries are freshly compiled. I have no conflicting older kernel drivers installed. Detailed specs are below.

I was able to run older versions of the kernel driver with my GeForce 8600M GT. I speculate that the newer driver/CUDA is incompatible with my rather ancient hardware but the “Getting Started” document suggests 2.2 works with GeForce 8-series hardware.

I would like to be able to develop code which supports features only available in CUDA 2.2.

Any suggestions?

Thanks,

Jeff

=========================================

cat /proc/version
Linux version 2.6.27.21-0.1-default (geeko@buildhost) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-03-31 14:50:44 +0200

=========================================

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 185.18.08 Thu Apr 30 15:48:49 PDT 2009
GCC version: gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux)

=========================================

…/…/bin/linux/release/deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA

Device 0: “GeForce 8600M GT”
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 268107776 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.95 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit…

=========================================

…/…/bin/linux/release/deviceQueryDrv
CUDA Device Query (Driver API) statically linked version
There is 1 device supporting CUDA

Device 0: “GeForce 8600M GT”
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 268107776 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.95 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED

Press ENTER to exit…

=========================================

…/…/bin/linux/release/matrixMul
cudaSafeCall() Runtime API error in file <matrixMul.cu>, line 108 : no CUDA-capable device is available.

=========================================

…/…/bin/linux/release/matrixMulDrv
Using device 0: GeForce 8600M GT
cuSafeCallNoSync() Driver API error = 0002 from file <matrixMulDrv.cpp>, line 96.

=========================================

…/…/bin/linux/release/simpleCUBLAS
simpleCUBLAS test running…
!!! CUBLAS initialization error

Hi! I’m getting the same error. I’m running Ubuntu 9.04 x86, 8400M GS card, 185.85 driver version.
I develop my own code for matrix multiplication and it works sometimes, but other times it fails (cudaGetErrorString(cudaError_t)) returns “no CUDA-capable device is available”.
After I reboot I can run program but a while later get the same error. Wow!

After two months, I still haven’t received any response on this query. It would appear that David Kirk’s faith in the many-to-one support model via the NVIDIA bulletin board is completely unfounded.

Seriously, NVIDIA, if you want people from high-end HPC to use your product, you’ve got to figure out a support model that does not suck.

How I am supposed to justify spending large portions of my time developing HPC cluster applications on top of CUDA when you can’t be bothered to help me debug a trivial problem on my laptop?

It’s becoming more clear that CUDA needs a paid one-on-one developer support model, like the enterprise support contracts offered for various server applications and operating systems. The forums cannot handle requests from people with serious deadlines and critical problems in a timely fashion. (The forums can barely keep up with the questions from casual developers as it is. The CUDA developer base seems to be growing very fast these days.)