I recently switched from a Quadro FX 4600 to a Quadro 4000. My application is an iterative tomographic reconstruction algorithm that makes heavy use of texture lookups. After the switch, I get slightly different results when using the new GPU. This is not just on the LSB, but up to 1.5% of the final value. I would have expected small differences but not such large ones. My computations are all in single precision float arithmetic.
I’m using CUDA v3.1. I have tried some compiler switches (different -arch and -code options, the " -ftz=true -prec-div=false -prec-sqrt=false" options, different optimization settings). Most leave the results on the Quadro 4000 unchanged or change, at most, the LSB of the result (after saving it as 16-bit unsigned integer).
Therefore my question: Does anyone know anything about changes in the way texture lookups are implemented in the two cards - any changes in coordinate computation, round-off, clamping, …? Any changes to math functions? Any other known changes that could cause the differences in results? Any compiler switches or code changes I could try to make results between the two cards more similar? Unfortunately, it would be very difficult to provide a compilable example to reproduce the problem.
Apologies for the vague problem description, and thanks for any help.