we are experiencing some weird issues with our TX2s. We saw that, at time, the output of two of our TX2s is not deterministic. We run DGEMM code (the example in the cuda suite), and while most of executions provide the same output, at times 2 out of 4 TX2s provide some weird output. Looking on line, we found that this could be caused by memory problems. We found this code: https://github.com/ihaque/memtestG80 to test GPU memories, run it on our TX2s and found that only the 2 that were providing deterministic results show errors in the DDR.
Any advice on how to solve this? Is RMA the only solution?
As your description, only that 2 with different results while running same application, and always them.
That could be some HW issues there, then please go through the RMA process.
Thanks for your quick replay!
I was suspecting that, no worries, I will go ahead and ask for RMA!