TX2 R28.2 crash

Hi all,

I use TX2 thd BSP version is R28.2. System crash sometimes occurs when shutting down, log as below:

[ 31.212867] Disabling non-boot CPUs …
[ 31.221483] CPU1: shutdown
[ 31.226565] psci: CPU1 killed.
[ 52.253451] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 52.267763] (detected by 4, t=5255 jiffies, g=2268, c=2267, q=59)
[ 52.282619] All QSes seen, last rcu_preempt kthread activity 5258 (4294905367-4294900109), jiffies_till_next_fqs=1, root ->qsmask 0x0
[ 52.303729] swapper/4 R running task 0 0 1 0x00000000
[ 52.320024] Call trace:
[ 52.331542] [] dump_backtrace+0x0/0x100
[ 52.346143] [] show_stack+0x14/0x1c
[ 52.360373] [] sched_show_task+0xa8/0xfc
[ 52.374987] [] rcu_check_callbacks+0xa9c/0xaa0
[ 52.390092] [] update_process_times+0x3c/0x6c
[ 52.405085] [] tick_sched_handle.isra.16+0x20/0x78
[ 52.420524] [] tick_sched_timer+0x44/0x7c
[ 52.435148] [] __hrtimer_run_queues+0x140/0x350
[ 52.450329] [] hrtimer_interrupt+0x9c/0x1e0
[ 52.465172] [] tegra186_timer_isr+0x24/0x30
[ 52.480002] [] handle_irq_event_percpu+0x84/0x290
[ 52.495386] [] handle_irq_event+0x44/0x74
[ 52.510015] [] handle_fasteoi_irq+0xb4/0x188
[ 52.524947] [] generic_handle_irq+0x24/0x38
[ 52.539728] [] __handle_domain_irq+0x60/0xb4
[ 52.554595] [] gic_handle_irq+0x5c/0xb4
[ 52.569044] [] el1_irq+0x80/0xf8
[ 52.582835] [] cpuidle_enter+0x18/0x20
[ 52.597185] [] call_cpuidle+0x28/0x50
[ 52.611439] [] cpu_startup_entry+0x17c/0x340
[ 52.626322] [] secondary_start_kernel+0x12c/0x164
[ 52.641660] [<0000000080081acc>] 0x80081acc
[ 52.654978] rcu_preempt kthread starved for 5351 jiffies! g2268 c2267 f0x2 s3 ->state=0x0
[ 56.029459] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]

How can I fix this bug? thanks.

kenny_cz,

Are you working on nvidie devkit with full BSP from jetpack?

Hi WayneWWW,

I use nvidie devkit full BSP and a 10G netcard via PCIE, and I found that when I insmod the netcard driver module system crash sometimes occurs when shutting down.

I usehttps://www.startech.com/Networking-IO/Adapter-Cards/pci-express-10g-sfp-network-adapter-card~PEX10000SFP netcard.

Hi WayneWWW,

I think there are some bugs with PCIE driver, TX1 and TX2 use BSP R28.2 version crash when shutting down. And use TX1 R24.2.1 BSP version it works well.

[ 56.029459] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]

kenny_cz,

It looks like a known issue that driver on upstream kernel 4.4 does not support this card.

Please

  1. Download vendor driver from https://sgcdn.startech.com/005329/media/sets/Tehuti_TN4010_Drivers/Tehuti_TN4010.zip in L4T home directory
  2. unzip the driver and goto Linux
  3. Run following commands to install driver
    sudo make -C /usr/src/linux-headers-uname -r/ modules_prepare
    sudo make all
    sudo make install
    sudo reboot

Sorry that I don’t have this card, so you need to try it yourself.

Hi WayneWWW,

Your version is old I use Tehuti tn40xx-0.3.6.16.1.tgz. The driver release support Linux kernel version: 2.6.32-4.15.
You can download driver form http://www.tehutinetworks.net/?t=drivers&L1=8&L2=12&L3=26

The tehutinetworks engineer tell me as below:
The closest thing we have here is a QNAP NAS using Amazon’s (previously Annapurna Labs) AL-314 which is a Quad Core A15 CPU.Neither us or any of QNAP’s customers ever reported an issue with our driver on this CPU.

kenny_cz,

Could you elaborate more about this error? Does this only happen when system is shutting down?

Not sure if this thread can help you.

https://devtalk.nvidia.com/default/topic/965204/

Hi WayneWWW,

Yes the system crash only when shutting down and it more likely to happen when another computer ping.

Hi, I am using custom board and is getting “soft lockup - CPU#0 stuck for” error. the board behavior is very different, if i keep the custom board out side enclosure this issue is not occurring once placed inside enclosure this issue is occurring. did anyone faced similar issue? what can be the probable cause?

I guess you should file a new topic for your issue.