is my Tesla card broken?

I have been using the Tesla C 1060 card for a few months (it was used for 2 years before I used it), then recently I changed my system to Ubuntu 11.10. Then I installed the cuda driver, the toolkit, and computing SDK, following the instructions from NVIDIA site.

Then I compiled my code, without any problem, and I executed the binary file. However, computation stops when trying to allocate the arrays to kernels, with the msg of either “failed to launch too many resources”, or “out of memory”.

Then I go to C/bin/linux/release, and run “./deviceQuery”, the result only shows one device: device 0, Quadra 1700, which might be the card going with the original computer.

So, up to this point, can I conclude that this Tesla card is broken? or there might be other tricks with settings or incompatibility with the OS?

Thanks a lot!

Highly unlikely that the Tesla is broken. More likely that it is not connected properly, powered properly, or properly configured in the software. We see all of those problems very frequently, but we don’t see GPUs just fail from overuse or from being old.

Thanks a lot!

Could you give me some hint on the possible issues with properly configuring the card in the software? I am pretty sure that the card is powered properly, and the slots for the card is working properly.

I always start with re-installing the driver (performing as clean an install as possible). You might also google around for others that have had issues getting devices detected in the same Ubuntu flavor. I’m not a Ubuntu guy myself, so can’t really help much there.

You’re not alone in this though. Frequently people have to wiggle the software a bit to get all the devices detected properly, especially when multiple devices are installed.