Tesla M2070 + CentOS 5.6 A strange issue regarding using 2 Teslas

I’ve tried searching these forums and haven’t found an answer, so here’s the problem:
CUDA code doesn’t seem to work on the 2nd GPU. The deviceQuery program in the SDK samples gives information which conflicts with nvidia-smi. Is this a sign of a broken GPU?

I’ve attached the outputs for both queries.
The deviceQuery output seems to indicate that everything is fine. However, the nvidia-smi output gives some disturbing information.
On the 2nd GPU, both the CPU and Memory Utilization fields give Unknown Errors. There are also some ECC memory errors. The clock speeds on the 2nd GPU also don’t seem to match factory specs.

Is this due to a software misconfiguration or is there something more going on here along the lines of a hardware failure?

Thank you!
nvidia-smi-q.txt (4.95 KB)
deviceQuery.txt (4.09 KB)

I also have two M2070s and nvidia-smi reports exactly the same for me, running on RHEL 6.1

When you say device code does not work on the second GPU, what are you seeing exactly?

Are you seeing any weird issues on which devices you can cudaMalloc on? I’m having some oddness there which is described at:

Pete