Machine Check Error (SError) at startup and during runtime

Hi,

We’ve a customer project based on TX2 and L4T R28.1.

At startup we receive always a Machine Check Error (SError) and usually the system continues to boot, but in some rare cases the system stops responding. The SError at boot looks like this:

[    8.533383] tegra_net_perf_init: cannot get wifi sclk
[    8.538565] found wifi platform device bcmdhd_wlan
[    8.544508] gpio tegra-gpio-aon wake69 for gpio=59(FF:3)
[    8.549865] wifi_platform_get_country_code_map: could not get country_code_map
[    8.557099] wifi_plat_dev_drv_probe:platform country code map is not available
[    8.564346] Power-up adapter 'DHD generic adapter'
[    8.569168] wifi_platform_set_power = 1
[    8.603664] CPU3: SError detected, daif=140, spsr=0x80000045, mpidr=80000101, esr=bf000002
[    8.603667] CPU0: SError detected, daif=140, spsr=0x80000000, mpidr=80000100, esr=bf40c000
[    8.603669] CPU4: SError detected, daif=140, spsr=0x20000000, mpidr=80000102, esr=bf40c000
[    8.603671] CPU5: SError detected, daif=1c0, spsr=0x600000c5, mpidr=80000103, esr=bf40c000
[    8.603679] CPU2: SError detected, daif=140, spsr=0x60000000, mpidr=80000001, esr=be000000
[    8.603685] CPU1: SError detected, daif=140, spsr=0x60000000, mpidr=80000000, esr=be000000
[    8.603793] ROC:CCE Machine Check Error:
[    8.603797]  Address Type = Secure DRAM
[    8.603824]  Address = 0x0 (Unknown Device)
[    8.603927] ROC:IOB Machine Check Error:
[    8.603928]  Address Type = Secure DRAM
[    8.603931]  Address = 0x0 (Unknown Device)
[    8.611938] CPU3: SError detected, daif=140, spsr=0x40000145, mpidr=80000101, esr=bf40c000

or in rare cases like this:

[    6.617151] tegra_net_perf_init: cannot get wifi sclk
[    6.622309] found wifi platform device bcmdhd_wlan
[    6.628026] gpio tegra-gpio-aon wake69 for gpio=59(FF:3)
[    6.633440] wifi_platform_get_country_code_map: could not get country_code_map
[    6.640670] wifi_plat_dev_drv_probe:platform country code map is not available
[    6.647913] Power-up adapter 'DHD generic adapter'
[    6.652729] wifi_platform_set_power = 1
[    6.740351] CPU0: SError detected, daif=140, spsr=0x80000045, mpidr=80000100, esr=bf000002
[    6.740353] CPU4: SError detected, daif=1c0, spsr=0x800000c5, mpidr=80000102, esr=bf40c000
[    6.740356] CPU5: SError detected, daif=140, spsr=0x60000045, mpidr=80000103, esr=bf40c000
[    6.740358] CPU3: SError detected, daif=140, spsr=0x60000045, mpidr=80000101, esr=bf40c000
[    6.740367] CPU1: SError detected, daif=140, spsr=0x20000045, mpidr=80000000, esr=be000000
[    6.740400] CPU2: SError detected, daif=140, spsr=0x80000000, mpidr=80000001, esr=be000000
[    6.740482] ROC:CCE Machine Check Error:
[    6.740485] 	Address Type = Secure DRAM
[    6.740513] 	Address = 0x0 (Unknown Device)
[    6.740617] ROC:IOB Machine Check Error:
[    6.740619] 	Address Type = Non-Secure MMIO
[    6.740621] 	Address = 0xc2f1d00 -- gpio + 0xd00
[    6.748647] CPU0: SError detected, daif=140, spsr=0x40000145, mpidr=80000100, esr=bf40c000

According to the TRM the SError indicates an hardware issue. The SError’s only occurs with our custom setup and not on the Jetson, so the question is how to track this down and find the root cause?!

Thanks in advance.

hawky,

If this is a hw degisn issue, please confirm it with the OEM DG first.

This issue made us a lot of headache, but at the end we found the problem.
The call to

gpio_set_debounce(priv->gpio, 50);

causes the machine check error. We removed the call - this shouldn’t be critical in our case: The GPIO isn’t used by a button.