low cublas single precision accuracy cublasDgemv <-> cublasSgemv

Hello everybody!
First post, first problem:

To get started with CUDA I executed some performance tests of functions I need frequently, such as
sgemv:

I got perfectly accurate results using cublasDgemv (compared to a simple CPU implementation), but cublasSgemv creates huge errors:

This simple 3x3 example yields 435462304956416 for the second component on the GPU and 435462338510848 on the CPU. --> abs Error = 33554432(!)

(17517372 8222629 16327114) (11549916)
(16646960 19007260 4118818) X (9953420)
( 6989178 16017092 5791423) (13111553)

This cannot possibly have been produced by the FMAD’s truncation?! (programming guide, p. 81)

Has anybody else observed this behaviour? (I used the forum search, didn’t find comparable results.)
If it’s because of the FMADs, is there a way to make cublas use__fadd_rn()/__fmul_rn()?

Any input will be appreciated.
Thanks in advance!

Isn’t single precision only accurate to 10^-7 resulting in 7 significant figures? I could be wrong, but I thought this was the case. The figures are fine for this precision. It may seem like a lot for an error, but in terms of magnitude it seems to be about right.

Oops, your absolutely right. I must have gotten distracted by the large number…

Thanks!

No problem. This is something that would definitely confuse me if I was absorbed in the problem.