First post, first problem:
To get started with CUDA I executed some performance tests of functions I need frequently, such as
I got perfectly accurate results using cublasDgemv (compared to a simple CPU implementation), but cublasSgemv creates huge errors:
This simple 3x3 example yields 435462304956416 for the second component on the GPU and 435462338510848 on the CPU. --> abs Error = 33554432(!)
(17517372 8222629 16327114) (11549916)
(16646960 19007260 4118818) X (9953420)
( 6989178 16017092 5791423) (13111553)
This cannot possibly have been produced by the FMAD’s truncation?! (programming guide, p. 81)
Has anybody else observed this behaviour? (I used the forum search, didn’t find comparable results.)
If it’s because of the FMADs, is there a way to make cublas use__fadd_rn()/__fmul_rn()?
Any input will be appreciated.
Thanks in advance!