Double-precision scientific computing code on GeForce GTX580

asmahani · March 5, 2012, 7:35pm

I have two GTX580 cards installed on a linux box. I have a CUDA/C++ code involving double-precision math. On one card the entire code runs without a glitch. On the second card, when I get to launching a kernel of dimensions 578x434, then roughly one out of two times the kernel fails to launch and I have to manually exit the program (^C). Note that on the first card with no problem, the actual duration of kernel is a small fraction of a second, so launch timeout is not an issue. To emphasize, this is the exact same binary that runs smoothly on one card, but hangs on another card of identical specs. The two cards are sitting in the same box.

My questions:

1- I understand that double-precision math on GeForce is more than twice slower than single-precision math, and this is why Tesla is recommended for fast, double-precision math. However, has there been any well-documented cases of GeForce being inherently ‘unstable’ (e.g. leading to launch failures such as mine) when executing double-precision code? In other words, can I expect that by migrating my code from double- to single-precision I can solve the stability problem? (The primary reason for me choosing double-precision is easier code migration from host-only to host-device and device-only versions. Double precision math runs more than twice faster on the host, at least on my machine, so my starting point logically has to be double precision.)

2- On a related note, what could lead to one GTX580 card behaving normally, and another one acting up, given that the driver and the whole environment is the same?

Thank you!

pasoleatis · March 5, 2012, 8:09pm

The double precision is 8 times slower than single precision on GTX card and 4 times slower than TESLA. If your code does not get 4 times slower when you switch from Tesla to GTX it means that you code is not instruction bounded. Tesla cards have more memory available. Tesla has ECC (memory correction) and better double precision performance and it is built to run longer than the GTX, cards. Though if the GTX cards can survive heavy gaming they should be able to survive heavy computations as well.
The card on which you get problems is defect you should do some hardware checks if the code runs on one card without problems and fails on the other. The easiest one is the vram check.

Topic		Replies	Views
Double precision throughput on GTX's CUDA Programming and Performance	2	3517	August 12, 2011
Double precision performance CUDA Programming and Performance	5	5640	May 22, 2011
GeForce 570 vs. Tesla c2050 CUDA Programming and Performance	3	1768	August 16, 2011
CUDA code runs on one card, not another CUDA Programming and Performance	10	907	July 21, 2011
failing under heavy double precision load CUDA Programming and Performance	10	2137	November 6, 2011
Nvidia GPU card for parallel computing, personal use CUDA Programming and Performance	8	3411	April 4, 2012
Performance of GTX 980 Ti as a General Purpose GPU CUDA Programming and Performance	5	4181	September 29, 2015
CUDA precision of desktop GPU CUDA Programming and Performance	9	2633	January 22, 2013
GTX580 runs faster than Tesla C2050 CUDA Programming and Performance	5	4839	January 13, 2012
Tesla vs GeForce archs What makes the tesla better? CUDA Programming and Performance	8	18320	September 14, 2009

Double-precision scientific computing code on GeForce GTX580

Related topics