High MTU causes Kernel Panic

It seems that the error is similar to Jetson Xavier AGX reboots when downloading docker image

Please just share the method/steps to reproduce issue at this moment.

Steps on dev-kit:

  1. Flash Jetson with Jetpack 4.6 (rev3) and SDK components. Setup Ubuntu.
  2. Update and upgrade.
  3. Install Aravis - GitHub - AravisProject/aravis: A vision library for genicam based cameras
  4. Setup ethernet port with nmcli and attached config file below. Place in /etc/NetworkManager/system-connections/
  5. Change root permissions on file
    sudo chmod -R 600 /etc/NetworkManager/system-connections/cam1
    sudo chown -R root:root /etc/NetworkManager/system-connections/cam1
  6. Connect IP Camera over ethernet. (LUCID Vision)
  7. Launch camera with gstreamer - sudo gst-launch-1.0 aravissrc camera-name=“10.42.0.46” ! fakesink

The above works when MTU in the config file is set to 1500. Breaks if its 9000. We need to support jumbo frames to reduce CPU usage and data loss.

cam1 (313 Bytes)

Hi,

It is unlikely for us to use your usecase to reproduce issue. Is there any simpler version to reproduce this?

We only have devkit here.

I am curious about something: Is this wired ethernet? I ask because it is phrased that it is probably wired, but the kernel failure is for WiFi. If so, then perhaps it is setting MTU for the wrong device.

@WayneWWW I’ve re-tried the patch described here, it works but with heavy data loss. Do you have an ethernet device that supports jumbo packets? You could try connecting that to a devkit and collect data from it to emulate the crash.
image

@linuxdev Yep, wired Ethernet to the camera, please refer above, MTU is being set correctly.
image

BTW - jetson_clocks is on, nvpmodel is MAXN, powered from wall adapter, data loss confirmed via showing frames on screen.

Please just use the latest version to test. We won’t check issue on older releases.

Right now I need to reset my private network’s router (including a firmware reset), which I have not got around to yet, so I am actually without the ability to try this (this is the network my embedded devices use).

It is not definite that WiFi is involved, but I am highly suspicious that it cannot find a MAC address for hardware just before the kernel dump:

[0012.644] I> Upda646] W> WARN: Fail to override "console=none" in command[0012.655] E> tegrabl_linuxboot_add_disp_param, du 0 faivalid slot number is found in scratch register
[0012.677] I> Linux Cmdline: console=ttyTCU0,115200 video=tegraoot.slot_suffix= boot.ratchetvalues=0.4.2 vpr_resize sdhed to get WIFI MAC address
[0012.722] W> MAC addr inval012.736] W> "plugin-manager" doesn't exist, creating
[0ugin-manager/chip-id
[0012.753] W> "configs" doesn't exAdding /chosen/plugin-manager/ids
[0012.771] W> "odm-dang
[0012.784] I> [0] START: 0x80000000, END: 0xac00000am_block larger than 80000000
[0012.802] I> [3] START: c200000, size:0x44800000] to /memory
[0012.819] I> addeed to get display params for du=0
[0012.835] W> "reset" MPIDR: 0x80000000
[0012.849] I> NVG: Logical CPU: 1; M0012.863] I> NVG: Logical CPU: 4; MPIDR: 0x80000200
[00ical CPU: 7; MPIDR: 0x80000301
[0012.883] W> "misc-data] I> Add storage-sdmmc to plugin-manager/misc-data
[00fragment-pcie-c5-rp matches
[0012.932] I> node /plugin-.
[0012.960] I> Kernel EP: 0x80080000, DTB: 0x90000000
000] Boot CPU: AArch64 Processor [4e0f0040]
[    0.000c200000 ,  44800000
[    0.000000] OF: fdt: - 100000000[tegra_comb_uart0] enabled
i 0003:01:00 0003:01:00.0: Falling back to user helper
:00.0: Falling back to user helper
[    8.450827] iwlwiing back to user helper
c8
[   92.396665] ---[ end trace a168f2b47d3b59c1 ]---
7 desc_alloc_skb.isra.6+0x13c/0x1c8
[   92.399618] ---[ end trace a168f2b47d3b59c2 ]---
[   92.401758] kernel BUG at /dvs/git/dirty/git-master_linux/kernel/kernel-4.9/mm/slub.c:3919!
[   92.401923] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

Notice the WiFi MAC address issue. But, there have been many patches over time, and as @WayneWWW mentions, you should probably try the next release. You can clone first if you want to save the content. If there is an error and you are unable to use a newer release, then there is a possibility that the device tree is involved since this is related to MAC address retrieval. Device tree is up to the vendor of the carrier board, and so you would have to talk to Auvidea for either an update or to see if they can find the error to know if it is device tree related.

Hi,

Same exact thing happens on the devkit, please refer to the attached log files and connection config.

…but I don’t see any kernel panic in your log.

I think kernel panic won’t be logged in dmesg. Please refer the site, for example

And at first, we must know who call the panic function at the kernel panic moment.

Can you see it in your kernel console?

This user is sharing uart log. Not only demsg.

UART log will record kernel panic.

Hi,

This user is sharing uart log. Not only demsg.

Maybe you mean the next message

kernel BUG at /dvs/git/dirty/git-master_linux/kernel/kernel-4.9/mm/slub.c:3919!

So, what is your next advice? kernel-4.9/mm/slub.c:3919 is maybe here.

void kfree(const void *x)
{
struct page *page;
void *object = (void *)x;

trace_kfree(RET_IP, x);

if (unlikely(ZERO_OR_NULL_PTR(x)))
return;

page = virt_to_head_page(x);
if (unlikely(!PageSlab(page))) {
BUG_ON(!PageCompound(page)); <-----!!
kfree_hook(x);
__free_pages(page, compound_order(page));
return;
}
slab_free(page->slab_cache, page, object, NULL, 1, RET_IP);
}
EXPORT_SYMBOL(kfree);

Hi, Mr.lizardperson

I think you are using Linux Kernel version 4.9.253-tegra on your board Auvidea X221.

Linux version 4.9.253-tegra (buildbrain@mobile-u64-5497-d3000) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision >d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05) ) #1 SMP PREEMPT Mon Jul 26 12:19:28 PDT 2021

I’m using Linux Kernel version 4.9.140-tegra on my Xavier AGX developper’s kit

Linux version 4.9.140-tegra (buildbrain@mobile-u64-2390-d3000) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision >d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05) ) #2 SMP PREEMPT Mon Dec 16 13:32:15 PST 2019

, set MTU 9000

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.226.68.179 netmask 255.255.255.0 broadcast 10.226.68.255

and I have no problem.

Later I will try same thing on L4T 32.6.1., if possible. But I think this panic may be caused by the bug of ethernet drivers.

[ 92.401758] kernel BUG at /dvs/git/dirty/git-master_linux/kernel/kernel-4.9/mm/slub.c:3919!

[ 92.546886] Call trace:
[ 92.549252] [] kfree+0x254/0x2a8
[ 92.554058] [] skb_free_head+0x28/0x48
[ 92.558876] [] skb_release_data+0x100/0x130
[ 92.564215] [] skb_release_all+0x30/0x40
[ 92.569545] [<008dacc18>] __netif_receive_skb_core+0x3b8/0xad8
[ 92.585298] [] __netif_receive_skb+0x28/0x78
[ 92.590638] [] netif_receive_skb_internal+0x2c/0xb0
[ 92.596764] [] napi_gro_receive+0x15c/0x188
[ 92.602104] [] eqos_napi_poll_rx+0x368/0x4f8
[ 92.608308] [] net_rx_action+0xf4/0x358
[ 92.613475] []boot_thread_fn+0x160/0x248
[ 92.630181] [<ffffff80080d3b59c3 ]—
[ 92.654705] Kernel panic - not syncing:ision : A02P

Hi All,

Thank you for your responses, it appears the fix from here fixes the bug for both the devkit and the Auvidea board - Kernel panic with jumbo frames in L4T 32.5.1 / TX2 4GB

Observation is that the initial ethernet connection on boot has a 100% error rate and that it’s necessary to restart the connection to get the correct behaviour.

What happened here? You said that patch didn’t work in the beginning and you say it works now.

The patch didn’t work when built and installed from within an existing flashed Jetson following these instructions. Had to recompile the kernel and reflash the jetson via SDKManager.

On boot, the ethernet connection errors out (screenshot attached in previous response) and must be restarted, e.g.

sudo nmcli con down cam
sleep 10
sudo nmcli con up cam

I think your so-called “these instructions does not work” is due to something else. Those instructions shall work. But if that doesn’t take effect, it means some other bugs got hit.

Were you testing on rel-32.7.2?

Hello, Mr. Wayne WWW

You don’t have to pursue it anymore. Recompiling has solved everything. Probably the cause is an inconsistency between the kernel and the driver.

Just want to clarify something

  1. If you only replace /boot/Image in rel-32.7.2, it won’t take effect due to a known bug.
  2. Those instructions in the document has nothing wrong. Please do not mislead other users.

Hello, Mr. Wayne WWW
If Mr. lizardperson recompiled it and everything is working fine, then the sources and documentation should be correct. Apparently a kernel panic happened on his board, but there’s no way to know what caused it anymore.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.