I am trying to see performance of CPU vs. GPU for a sparse matrix vector multiplication code. But, the answers for both computation differ. when i tried to run the GPU code in device emulation mode, the answers matches with the CPU.
I anyone knows how to overcome this problem please let me know.
(P.S. I have heard that by default FPU in CPU is set to 80 bit, hence al the operations are performed internally at 80 bits and then the answer is converted back to 64 bit on the other hand GPU operations are performed at 64 bit. Is this right? if so then how can I overcome this problem? Is it possible to perform operations on CPU at 64 bit instead of 80 bit (if what I just mentioned above is true)?
btw. I am using AMD 64 bit machine with fedora 10 and tesla c1060 GPU.