I have a question concerning the percision/stability of desktop GPGPU.
I do have a GTX 560 Ti, I use CUDA to train Multi-Layer Perceptrons.
Typically a single precission is fine for me. I know that GPU float computation
is negligibly less precise than the CPU one, but this has never been a problem,
if the precision stays the same over time.
I tried to test the stability of the computation, in a way that I was running
a matrix multiplication example “matrixMul” from the 4.1.28 SDK in a loop.
This tool does a cross check with the CPU implemented, so I can detect if
the GPU starts to compute something strange.
So I ran the test, and after 8 days of full load, the precision started to decrease:
Listing first 100 Differences > 0.000010…
Loc(0,0) CPU=163.09207 GPU=163.09212 Diff=0.000046
Loc(2,0) CPU=168.34337 GPU=168.34328 Diff=0.000092
Loc(3,0) CPU=156.45810 GPU=156.45802 Diff=0.000076
Loc(4,0) CPU=162.84628 GPU=162.84631 Diff=0.000031
Loc(5,0) CPU=161.11246 GPU=161.11253 Diff=0.000076
Loc(6,0) CPU=164.38638 GPU=164.38628 Diff=0.000107
My question is, is this normal? Or is it a flaw that can it be used
as reason to replace the card with another? Also is it sure that this
does kind of problems does no not occur if I will use tesla GPUs?
Is there some optional automatical mechanism that will ensure that the
output of GPGPU computation is accurate? (It is a bit scary not to be
able to trust and reconstruct the results due to time varying
precision of the GPU)
Thank you in advance for any comments/suggestions,