I’m trying to solve a large non-linear system using the Nvidia TX2 and for this I need to reduce (sum) 6x6 matrices of doubles.
Using thrust::reduce_by_key on a PC produces correct results however when moving the same code to TX2 the results differ significantly.
PC specs: Ubuntu 16.04, GeForce GTX 1060, Cuda 8.0.61
TX2 specs: Ubuntu 16.04, Cuda V9.0.252, L4T 28.2.1
I put here simple repo including data to verify this situation: https://github.com/tatito0/app_thrust_reduce_by_key_debug
Is there any compile flag settings that should be added on TX2 or is the TX2 using less precise double operations?