Hardware problem with Tesla card?

I coded a small piece of the application I want to write using CUDA. I based it on the matrixMul example. When I ran it, it failed at the first CUDA call, cudaMalloc(), with error 10999, “Unspecified driver error”. I am calling CUT_DEVICE_INIT() prior to the cudaMalloc call. I couldn’t figure out what was causing this, so I tried some of the SDK examples I first tried after I installed the Tesla card. They failed today, but had worked weeks ago when I first tried them. For example, the matrixMul program showed that the GPU values were all 0.0. The mersenneTwister program failed and I noticed that the Samples/sec speed was about 4x less than my recollection from when I ran it before. I then tried the bandwidthTest program. It showed that all transfer speeds are much slower than the first time I ran it. E.g. the host->device speed was 2.0 GB/s while my recollection is that it ran at 16 GB/s earlier. The device->device time is 4 GB/s while I think it used to be much higher (65 GB/s).

Any ideas about what’s going on? Or a suggestion about what to try. I’m using a Tesla C870 on a quadcore Pentium with Windows XP. The driver version is 169.21. I did power cycle the PC; which didn’t fix the problem. It looks like a hardware issue to me. Is there anything I can do that would help confirm that it’s a hardware problem?

What graphics device are you using for display?
and can you print out the results of running the deviceQuery SDK sample?

The display is a GeForce 8600 GTS. Here’s the output from deviceQuery:

There are 2 devices supporting CUDA

Device 0: “Tesla C870”
Major revision number: 1
Minor revision number: 0
Total amount of global memory: 1610350592 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1350000 kilohertz

Device 1: “GeForce 8600 GTS”
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 268107776 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1458000 kilohertz

Test PASSED

Press ENTER to exit…

You are running on the 8600 GTS. Try to use cudaSetDevice(0), so that it will run on the C870.

Nope. It’s using device 0. I had replaced the CUT_DEVICE_INIT() macro with the code the macro expands to and stepped through it with the debugger. It’s setting the variable dev to 0 and calling cudaSetDevice(0). I then replaced the code with the single line cudaSetDevice(0); The code still failed with the same error on the first cudaMalloc call.

Also this is the same PC that I tried the examples on when I first got the Tesla card. The examples worked then, but fail today.

Could you try cudaSetDevice(1)?

Tried it. Same error, 10999, at the first cuda call (cudaMalloc). And the call to cudaSetDevice works. This seems to indicate that it’s not a hardware problem.

Seems like it might be worth taking a step back and running the unmodified SDK Release binaries. If you’ve changed code and re-built some of the Release binaries maybe install a clean SDK and run some other samples

When I say I’ve run the examples I mean the unmodified executables in the Release directory. The only one I’ve rebuilt is the matrixMul example and I copied all the files to another directory and rebuilt the copied version.

I just tried 2 more examples, convolutionfft2D and convolutionSeparable. Both failed. The one common component is the driver. I’ll try reinstalling that tomorrow and see what happens.

I reinstalled the driver and the problem wasn’t fixed. Then I reinstalled the SDK and the problem is fixed! All the examples work and my code isn’t returning errors on the CUDA calls.

I have no idea why this worked. I’m glad to be back where I want to be, debugging my code :)