TX1 box reboot error

More than 1K TX1 boxes are online now and about 30 boxes have the experience of reboot abnormally ,it seems os hang when reboot,HDMI plugged can be detected and some error information will be output<failling back to user helper,cannot find firmware …retry after 1s >
please review it asap,thanks.

Please have more detail information is better to analysis the problem.

more information at https://share.weiyun.com/5Hc4rXO,picture is HDMI output,it’s a fixed output。

@kaichengshi
We can’t access the like.

what’s your phone number,I’d like to call you directly,my phone is 13813825700

We can see the file now.
Could you give more information about your device and BSP version?

BSP Version R28.2.1(Kernel-4.4)

Does those failed device always fail or fail dynamically?
Did you design your own carrior board?

not always,just sometimes,very low frequecy. we use 3 ODM products,different carrier board,but these products have this same issue.
So,we believe it’s your platform issue ,please pay high attention on this.

Could you verify the failed TX1 module on devkit.

No TX1 module devkit in ours hands now,do you release any patch after this kernel version to fix system stability when power on?

Please check the log and HDMI picture carefully.

Could you repo this issue to your ODM vendor let them to try on the devkit.
Don’t know if the new release r28.3 can fix your problem or not. Please have a try it.

what’s the change from R28.2.1 to R28.3 ? Any content related this boot issue?

any response?

Looks like there’s no directly fixed for boot issue with r28.3
Could you report this issue to your ODM vendor, they can access NV bug system to get more support.

cannot understand your meaning,ODM have special communication path with NV?

Your text log does not match your image, so we don’t know what is real problem here.

If this is “failling back to user helper,cannot find firmware …retry after 1s”, then it has nothing to do with HDMI. This error is due to missing usb controller firmware.

Hi kaichengshi,

Have you managed to get the correct logs from failure devices?
Any update from your partner side?

Thanks

why you so sure that our logs is not correct? no update logs till now,please review the log again,no available suggestion from your side now.

Hi,

We said your log is incorrect because it does not match your image, there are two logs from your image.

tegradc tegradc.1: nominal-pclk:148500000 …
and
tegradc tegradc:1 hdmi: plugged

but if I search the keyword of “hdmi” in your text log, there is only

Apr  2 02:00:21 tegra-ubuntu kernel: [    0.555632] hdmi: couldn't get regulator vdd_hdmi_5v0: -517

This line is common on custom carrier board and should not cause problem.

Also, your log is very long, and it seems comes from many unrelated boot up.
For example, please take a look at the below timestamp.

Their first line in the log, I think this should be the first boot up.

Apr  1 20:00:21 tegra-ubuntu kernel: [    0.000000] Booting Linux on physical CPU 0x0
Apr  1 20:00:21 tegra-ubuntu kernel: [    0.000000] Initializing cgroup subsys cpuset
Apr  1 20:00:21 tegra-ubuntu kernel: [    0.000000] Initializing cgroup subsys cpu
Apr  1 20:00:21 tegra-ubuntu kernel: [    0.000000] Initializing cgroup subsys cpuacct

and following

Apr  1 20:00:38 tegra-ubuntu kernel: [   25.215190] Disabling IRQ #263   # looks like boot ends here
Apr  1 20:01:02 tegra-ubuntu kernel: [   48.320904] tegradc tegradc.1: blank - powerdown  # almost 30 second passed, this display log is not fatal error.
Apr  1 20:01:02 tegra-ubuntu kernel: [   48.332648] tegradc tegradc.1: unblank
Apr  1 20:11:03 tegra-ubuntu kernel: [  650.209626] tegradc tegradc.1: blank - normal  
<b>Apr  1 21:00:21 tegra-ubuntu kernel: [    0.000000] Booting Linux on physical CPU 0x0 </b>  # System reboots here but almost one hour later.
Apr  1 21:00:21 tegra-ubuntu kernel: [    0.000000] Initializing cgroup subsys cpuset
Apr  1 21:00:21 tegra-ubuntu kernel: [    0.000000] Initializing cgroup subsys cpu
Apr  1 21:00:21 tegra-ubuntu kernel: [    0.000000] Initializing cgroup subsys cpuacct

One more thing to mention, your devices start to give out lots of kernel error from some boot log, which also not matches your image.

Could you just point out where is the error in the text log?