Double-precision scientific computing code on GeForce GTX580

I have two GTX580 cards installed on a linux box. I have a CUDA/C++ code involving double-precision math. On one card the entire code runs without a glitch. On the second card, when I get to launching a kernel of dimensions 578x434, then roughly one out of two times the kernel fails to launch and I have to manually exit the program (^C). Note that on the first card with no problem, the actual duration of kernel is a small fraction of a second, so launch timeout is not an issue. To emphasize, this is the exact same binary that runs smoothly on one card, but hangs on another card of identical specs. The two cards are sitting in the same box.

My questions:

1- I understand that double-precision math on GeForce is more than twice slower than single-precision math, and this is why Tesla is recommended for fast, double-precision math. However, has there been any well-documented cases of GeForce being inherently ‘unstable’ (e.g. leading to launch failures such as mine) when executing double-precision code? In other words, can I expect that by migrating my code from double- to single-precision I can solve the stability problem? (The primary reason for me choosing double-precision is easier code migration from host-only to host-device and device-only versions. Double precision math runs more than twice faster on the host, at least on my machine, so my starting point logically has to be double precision.)

2- On a related note, what could lead to one GTX580 card behaving normally, and another one acting up, given that the driver and the whole environment is the same?

Thank you!

The double precision is 8 times slower than single precision on GTX card and 4 times slower than TESLA. If your code does not get 4 times slower when you switch from Tesla to GTX it means that you code is not instruction bounded. Tesla cards have more memory available. Tesla has ECC (memory correction) and better double precision performance and it is built to run longer than the GTX, cards. Though if the GTX cards can survive heavy gaming they should be able to survive heavy computations as well.
The card on which you get problems is defect you should do some hardware checks if the code runs on one card without problems and fails on the other. The easiest one is the vram check.