CUDA Thrust reduce_by_key returns different values on Nvidia Jetson TX2 vs PC

I’m trying to solve a large non-linear system using the Nvidia TX2 and for this I need to reduce (sum) 6x6 matrices of doubles.

Using thrust::reduce_by_key on a PC produces correct results however when moving the same code to TX2 the results differ significantly.

PC specs: Ubuntu 16.04, GeForce GTX 1060, Cuda 8.0.61

TX2 specs: Ubuntu 16.04, Cuda V9.0.252, L4T 28.2.1

I put here simple repo including data to verify this situation:

Is there any compile flag settings that should be added on TX2 or is the TX2 using less precise double operations?


Suppose you need to copy the input array value from CPU into GPU.
There is only an array pointer copy inside your code:

thrust::device_vector<IdxPair> d_pairArr = h_pairArr;
thrust::device_vector<PBA_Mat6> d_matArr = h_matArr;

Please also noticed that the memory address of CPU/GPU are different.
You cannot assign it with pointer directly. An copy for each value is required.


I believe the above statements are valid copies thanks to Thrust, those are not pointer copies. Can you expand on your description, and rewrite it correctly if possible (perhaps I’m missing something)?

Also note this works well on a PC.

It appears to be a bug on CUDA 9. I downgraded to CUDA 8.0 via Jetpack 3.1 and it works fine there. Is there a way to report a bug?


Sorry for the late reply.

We can reproduce this issue internally and already feedback to our internal team.
Will update here once we have further information.