CPU Errors on TX-2

We have recently been experiencing some issues with one of our TX-2 developer kits. Seeing lots of messages like this in /var/log/kern.log:

Jan 11 19:08:25 x kernel: [ 1382.665868] CPU4: SError detected, daif=140, spsr=0x20000000, mpidr=80000102, esr=bf000000
Jan 11 19:08:25 x kernel: [ 1382.666625] CPU5: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000103, esr=bf40c000
Jan 11 19:08:25 x kernel: [ 1382.667799] ROC:IOB Machine Check Error:
Jan 11 19:08:25 x kernel: [ 1382.667800]   Address Type = Secure DRAM
Jan 11 19:08:25 x kernel: [ 1382.667805]   Address = 0x0 (Unknown Device)

We also see some messages like this

Jan 11 19:40:51 x kernel: [ 1790.856868] **************************************
Jan 11 19:40:51 x kernel: [ 1790.856868] Machine check error in DCC:1:
Jan 11 19:40:51 x kernel: [ 1790.856869]   Status = 0xf400000100000405
Jan 11 19:40:51 x kernel: [ 1790.856869]   Bank does not have any known errors
Jan 11 19:40:51 x kernel: [ 1790.856869]   Overflow (there may be more errors)
Jan 11 19:40:51 x kernel: [ 1790.856870]   Uncorrected (this is fatal)
Jan 11 19:40:51 x kernel: [ 1790.856870]   Error reporting enabled when error arrived
Jan 11 19:40:51 x kernel: [ 1790.856871]   ADDR = 0xbb
Jan 11 19:40:51 x kernel: [ 1802.274905] **************************************

We have been seeing these issues for a few days using JetPack 3.1. Wanted to rule out some type of software issue, so we loaded JetPack 3.2 and we are seeing the same issues.

Anyone have these issues before? Looks like the kit is going bad…

Hi kgdad,

If only happened on one TX2 module, event adopt it on other board is the same, can’t repro on other TX2 module, then you could consider to RMA - https://developer.nvidia.com/embedded/faq#rma-process

  1. Go to http://nvidia.custhelp.com/app/home
  2. Select “Live Chat” option to chat online with one of our customer care agents.
  3. Enter your contact information.
  4. Select the “Jetson” from the product drop-down list.
  5. Submit the request.

Thanks

We haven’t seen this on any of our other boards so I think it might be bad hardware. Will follow the steps provided.

Thanks!