Different results for RTX and non-RTX hardware?

I have a very simple case that shoots just one ray at a single triangle. The results vary ever so slightly depending on the hardware compute capability where the application is run. Both machines have identical Optix 7.0 libraries installed and are running the same executable with compilation limited to just compute_60,sm_60. Here are the results in which the hardware type, 3 triangle vertices, then launch and hit info are printed. There is a slight difference in the hit distance, t:

GeForce RTX 2080 Ti, compute capability = 75
v0 1.99999988e-01 -1.99999988e-01 1.99999988e-01
v1 0.00000000e+00 0.00000000e+00 1.99999988e-01
v2 0.00000000e+00 0.00000000e+00 -1.99999988e-01
origin: 2.09999993e-01 -9.99999940e-02 9.99999940e-02
dir: -1.00000000e+00 0.00000000e+00 0.00000000e+00
t = 1.09999992e-01 <---- DIFFERENCE HERE
u = 2.49999985e-01
v = 2.49999985e-01

GeForce GTX 1060, compute capability = 61
v0 1.99999988e-01 -1.99999988e-01 1.99999988e-01
v1 0.00000000e+00 0.00000000e+00 1.99999988e-01
v2 0.00000000e+00 0.00000000e+00 -1.99999988e-01
origin: 2.09999993e-01 -9.99999940e-02 9.99999940e-02
dir: -1.00000000e+00 0.00000000e+00 0.00000000e+00
t = 1.09999999e-01 <---- DIFFERENCE HERE
u = 2.49999985e-01
v = 2.49999985e-01

When I shoot a denser set of rays, then there can be slight differences observed in the resulting t, u, and/or v values – or no differences at all. It seems as if there is a variation in the least significant bit of the result.

By any chance, is there a way to make these ray trace results identical? Thanks.

Hi there,

The main difference here is the use of RTX hardware, not really the compute capability. The triangle intersection on the 2080Ti is done in hardware, and on the GTX 1060 it’s software. This means the OptiX built-in triangle intersection can never be executing identical instructions between these 2 GPUs, and there is no way to request identical execution unless you want to avoid using the built-in OptiX primitives. Due to the differences between software and hardware implementation, it is expected that order of operations, intermediate results, and rounding will be affected, so a 1 ULP discrepancy in the result is not surprising.

If you wish to prioritize matching results between RTX and non-RTX GPUs over performance, the way to guarantee matching results is to avoid the hardware triangle intersector by writing your own triangle intersection program. This way you will have complete control over the execution on any GPU. But be aware that you will sacrifice a lot of performance compared to the RTX hardware triangle intersection.


David.