I have two GTX580 cards installed on a linux box. I have a CUDA/C++ code involving double-precision math. On one card the entire code runs without a glitch. On the second card, when I get to launching a kernel of dimensions 578x434, then roughly one out of two times the kernel fails to launch and I have to manually exit the program (^C). Note that on the first card with no problem, the actual duration of kernel is a small fraction of a second, so launch timeout is not an issue. To emphasize, this is the exact same binary that runs smoothly on one card, but hangs on another card of identical specs. The two cards are sitting in the same box.
My questions:
1- I understand that double-precision math on GeForce is more than twice slower than single-precision math, and this is why Tesla is recommended for fast, double-precision math. However, has there been any well-documented cases of GeForce being inherently ‘unstable’ (e.g. leading to launch failures such as mine) when executing double-precision code? In other words, can I expect that by migrating my code from double- to single-precision I can solve the stability problem? (The primary reason for me choosing double-precision is easier code migration from host-only to host-device and device-only versions. Double precision math runs more than twice faster on the host, at least on my machine, so my starting point logically has to be double precision.)
2- On a related note, what could lead to one GTX580 card behaving normally, and another one acting up, given that the driver and the whole environment is the same?
Thank you!