Hardware problem with Tesla card?

BruceD · April 1, 2008, 12:18am

I coded a small piece of the application I want to write using CUDA. I based it on the matrixMul example. When I ran it, it failed at the first CUDA call, cudaMalloc(), with error 10999, “Unspecified driver error”. I am calling CUT_DEVICE_INIT() prior to the cudaMalloc call. I couldn’t figure out what was causing this, so I tried some of the SDK examples I first tried after I installed the Tesla card. They failed today, but had worked weeks ago when I first tried them. For example, the matrixMul program showed that the GPU values were all 0.0. The mersenneTwister program failed and I noticed that the Samples/sec speed was about 4x less than my recollection from when I ran it before. I then tried the bandwidthTest program. It showed that all transfer speeds are much slower than the first time I ran it. E.g. the host->device speed was 2.0 GB/s while my recollection is that it ran at 16 GB/s earlier. The device->device time is 4 GB/s while I think it used to be much higher (65 GB/s).

Any ideas about what’s going on? Or a suggestion about what to try. I’m using a Tesla C870 on a quadcore Pentium with Windows XP. The driver version is 169.21. I did power cycle the PC; which didn’t fix the problem. It looks like a hardware issue to me. Is there anything I can do that would help confirm that it’s a hardware problem?

e.ping · April 1, 2008, 12:58am

What graphics device are you using for display?
and can you print out the results of running the deviceQuery SDK sample?

BruceD · April 1, 2008, 4:47pm

The display is a GeForce 8600 GTS. Here’s the output from deviceQuery:

There are 2 devices supporting CUDA

Device 0: “Tesla C870”
Major revision number: 1
Minor revision number: 0
Total amount of global memory: 1610350592 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1350000 kilohertz

Device 1: “GeForce 8600 GTS”
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 268107776 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1458000 kilohertz

Test PASSED

Press ENTER to exit…

mfatica · April 1, 2008, 6:39pm

You are running on the 8600 GTS. Try to use cudaSetDevice(0), so that it will run on the C870.

BruceD · April 1, 2008, 8:33pm

Nope. It’s using device 0. I had replaced the CUT_DEVICE_INIT() macro with the code the macro expands to and stepped through it with the debugger. It’s setting the variable dev to 0 and calling cudaSetDevice(0). I then replaced the code with the single line cudaSetDevice(0); The code still failed with the same error on the first cudaMalloc call.

Also this is the same PC that I tried the examples on when I first got the Tesla card. The examples worked then, but fail today.

mfatica · April 1, 2008, 8:56pm

Could you try cudaSetDevice(1)?

BruceD · April 1, 2008, 10:53pm

Tried it. Same error, 10999, at the first cuda call (cudaMalloc). And the call to cudaSetDevice works. This seems to indicate that it’s not a hardware problem.

e.ping · April 1, 2008, 11:03pm

Seems like it might be worth taking a step back and running the unmodified SDK Release binaries. If you’ve changed code and re-built some of the Release binaries maybe install a clean SDK and run some other samples

BruceD · April 1, 2008, 11:35pm

When I say I’ve run the examples I mean the unmodified executables in the Release directory. The only one I’ve rebuilt is the matrixMul example and I copied all the files to another directory and rebuilt the copied version.

I just tried 2 more examples, convolutionfft2D and convolutionSeparable. Both failed. The one common component is the driver. I’ll try reinstalling that tomorrow and see what happens.

BruceD · April 2, 2008, 5:34pm

I reinstalled the driver and the problem wasn’t fixed. Then I reinstalled the SDK and the problem is fixed! All the examples work and my code isn’t returning errors on the CUDA calls.

I have no idea why this worked. I’m glad to be back where I want to be, debugging my code :)

Topic		Replies	Views
Checking Performance 2Âº round Trying to reproduce the results ..... CUDA Programming and Performance	4	1718	October 15, 2008
Test Failed When using Tesla CUDA Programming and Performance	3	4398	July 20, 2008
Random behaviour with TESLA C870 CUDA Programming and Performance	11	6442	May 29, 2008
Tesla device problem Is it broken or it is just driver CUDA Programming and Performance	3	1012	March 16, 2012
CUDA 1.1: Card selection Tesla Vs 8600 CUDA Programming and Performance	17	33168	July 31, 2011
Tesla doesn't work in multidevice system CUDA Programming and Performance	4	3733	June 20, 2008
No device supporting CUDA? CUDA Programming and Performance	18	20791	January 31, 2008
CUDA 2.2 + module 185.18.08 queries affirmatively but fails to work CUDA Programming and Performance	2	8511	August 3, 2009
Problem with Tesla C870 Program can not run CUDA Programming and Performance	11	3455	April 17, 2009
CUDA programs do not run anymore TESLA C1060 replacing a Quadro FX 570 CUDA Programming and Performance	6	5544	September 15, 2009

Hardware problem with Tesla card?

Related topics