Tx2 r32.3.1 bpmp problem serial console output

[ 8174.891106] bpmp_wait_ack() returned -110 (ch 22 mrq 3 data <0x83 0x00 0x00 0x06>)
[ 8174.898719] ------------[ cut here ]------------
[ 8174.903343] WARNING: CPU: 0 PID: 15038 at /home/tbuckley/tmp/seattle_tx2_kernel/sources/kernel/nvidia/drivers/firmware/tegra/mail.c:323 bpmp_send_receive_atomic+0x248/0x268
[ 8174.918870] —[ end trace 7c23e2321705006e ]—

Hi,

Please learn to describe your problem instead of just pasting a log without saying anything.
How could other people help you without any information?

Is this a custom carrier board? Could you share how you hit this error?

my syslog is filling my disk. looks like getting these messages, this just started after apt update and apt upgrade and apt dist-upgrade yesterday.

[ 1931.978120] syncpt_thresh_cpu0_int_status(15) = 0x00000000
[ 1931.978123] syncpt_thresh_cpu0_int_status(16) = 0x00000000
[ 1931.978127] syncpt_thresh_cpu0_int_status(17) = 0x00000000
[ 1934.471072] bpmp_wait_ack() returned -110 (ch 22 mrq 3 data <0x83 0x00 0x00 0x06>)
[ 1934.478690] ------------[ cut here ]------------
[ 1934.483314] WARNING: CPU: 0 PID: 14969 at /home/tbuckley/tmp/seattle_tx2_kernel/sources/kernel/nvidia/drivers/firmware/tegra/mail.c:323 bpmp_send_receive_atomic+0x248/0x268
[ 1934.498672] Modules linked in: 88x2bu(O) cfg80211 btusb btrtl btbcm btintel bnep fuse zram overlay binfmt_misc spidev stalkerButton(O) nvgpu bluedroid_pm lt81x_mipi(O) ip_tables x_tables

[ 1934.498724] CPU: 0 PID: 14969 Comm: NVMVidEncInputT Tainted: G O 4.9.140-tegra #86
[ 1934.498727] Hardware name: lightning (DT)
[ 1934.498731] task: ffffffc07a4e0000 task.stack: ffffffc057a94000
[ 1934.498736] PC is at bpmp_send_receive_atomic+0x248/0x268
[ 1934.498740] LR is at bpmp_send_receive_atomic+0x248/0x268
[ 1934.498744] pc : [] lr : [] pstate: 004001c5
[ 1934.498746] sp : ffffffc057a978d0
[ 1934.498749] x29: ffffffc057a978d0 x28: 0000000000000004
[ 1934.498756] x27: ffffffc057a97c18 x26: 0000000000000016
[ 1934.498762] x25: ffffffc057a97c1c x24: 00000000000f4240
[ 1934.498769] x23: 000001c172dafd4d x22: ffffff800a189c08
[ 1934.498775] x21: ffffff8009fbb458 x20: 20c49ba5e353f7cf
[ 1934.498781] x19: 0000000000000003 x18: 0000000000000010
[ 1934.498787] x17: 0000000000000000 x16: 0000000000000000
[ 1934.498793] x15: ffffffffffffffff x14: 3078302033387830
[ 1934.498799] x13: 3c20617461642033 x12: 2071726d20323220
[ 1934.498804] x11: 686328203031312d x10: 0000000000000746
[ 1934.498810] x9 : 722029286b63615f x8 : ffffff80083d47b0
[ 1934.498816] x7 : ffffff8009e94358 x6 : ffffffc0f7053bf0
[ 1934.498822] x5 : ffffffc0f7053bf0 x4 : 0000000000000000
[ 1934.498828] x3 : ffffffc0f70597f8 x2 : ffffffc0f7053bf0
[ 1934.498834] x1 : ffffffc07a4e0000 x0 : 0000000000000046

[ 1934.498842] —[ end trace 9902c18510fafb71 ]—
[ 1934.503454] Call trace:
[ 1934.503461] [] bpmp_send_receive_atomic+0x248/0x268
[ 1934.503466] [] tegra_bpmp_send_receive_atomic+0xc4/0x170
[ 1934.503474] [] clk_bpmp_is_enabled+0x44/0x68
[ 1934.503480] [] __clk_is_enabled+0x38/0x60
[ 1934.503487] [] nvhost_module_get_rate+0x80/0xf0
[ 1934.503493] [] nvhost_channelctl+0xa20/0xf28
[ 1934.503499] [] do_vfs_ioctl+0xb0/0x8d8
[ 1934.503503] [] SyS_ioctl+0x8c/0xa8
[ 1934.503509] [] el0_svc_naked+0x34/0x38
[ 1934.504388] bpmp: mrq 22 took 3560000 us
[ 1934.508444] bpmp: mrq 19 took 3944000 us
^C

Is there an FAQ telling me what information is required in a post, I got the initial post from the serial console, and now after lunch my system is furbar’d. my gstreamer task is dead also, again it appears to have started after apt updata.

tx2 r32.3.1

Terry

One thing I’d recommend for this case (not necessarily for all reports) is the output from a full serial console boot log. If the initrd is involved as part of any package update/upgrade, then any update/upgrade to this which is not from NVIDIA would very likely cause serious problems, and would require logs prior to Linux ever loading. Also, on more recent releases capable of dist-upgrade, you would also need to post the output from what is shown below:

  • uname -a
    (normally only “uname -r” would be needed, but initrd involvement implies use “-a”)
  • dpkg -l | egrep -i '(initrd|nvidia-l4t-)'
  • dpkg -S /boot/initrd
  • cat /proc/extlinux/extlinux.conf
    (or just copy the file and upload as attachment)
  • cat /proc/cmdline

Some of the above is actually also part of the full serial console boot log but having it all in one place makes things easier. I have to wonder if perhaps the dist-upgrade replaced something it should not touch (only an NVIDIA initrd should be used).

1 Like

~$ uname -a
Linux BaseSystem_0_1 4.9.140-tegra #86 SMP PREEMPT Thu Sep 17 15:19:17 CDT 2020 aarch64 aarch64 aarch64 GNU/Linux

dpkg -l | egrep -i ‘(initrd|nvidia-l4t-)’
ii nvidia-l4t-3d-core 32.3.1-20191209230245 arm64 NVIDIA GL EGL Package
ii nvidia-l4t-apt-source 32.3.1-20191209230245 arm64 NVIDIA L4T apt source list debian package
ii nvidia-l4t-bootloader 32.3.1-20191209230245 arm64 NVIDIA Bootloader Package
ii nvidia-l4t-camera 32.3.1-20191209230245 arm64 NVIDIA Camera Package
ii nvidia-l4t-ccp-t186ref 32.3.1-20191209230245 arm64 NVIDIA Compatibility Checking Package
ii nvidia-l4t-configs 32.3.1-20191209230245 arm64 NVIDIA configs debian package
ii nvidia-l4t-core 32.3.1-20191209230245 arm64 NVIDIA Core Package
ii nvidia-l4t-cuda 32.3.1-20191209230245 arm64 NVIDIA CUDA Package
ii nvidia-l4t-firmware 32.3.1-20191209230245 arm64 NVIDIA Firmware Package
ii nvidia-l4t-graphics-demos 32.3.1-20191209230245 arm64 NVIDIA graphics demo applications
ii nvidia-l4t-gstreamer 32.3.1-20191209230245 arm64 NVIDIA GST Application files
ii nvidia-l4t-init 32.3.1-20191209230245 arm64 NVIDIA Init debian package
ii nvidia-l4t-initrd 32.3.1-20191209230245 arm64 NVIDIA initrd debian package
ii nvidia-l4t-jetson-io 32.3.1-20200115121627 arm64 NVIDIA Jetson.IO debian package
ii nvidia-l4t-kernel 4.9.140-tegra-32.3.1-20191209230245 arm64 NVIDIA Kernel Package
ii nvidia-l4t-kernel-dtbs 4.9.140-tegra-32.3.1-20191209230245 arm64 NVIDIA Kernel DTB Package
ii nvidia-l4t-kernel-headers 4.9.140-tegra-32.3.1-20191209230245 arm64 NVIDIA Linux Tegra Kernel Headers Package
ii nvidia-l4t-multimedia 32.3.1-20191209230245 arm64 NVIDIA Multimedia Package
ii nvidia-l4t-multimedia-utils 32.3.1-20191209230245 arm64 NVIDIA Multimedia Package
ii nvidia-l4t-oem-config 32.3.1-20191209230245 arm64 NVIDIA OEM-Config Package
ii nvidia-l4t-tools 32.3.1-20191209230245 arm64 NVIDIA Public Test Tools Package
ii nvidia-l4t-wayland 32.3.1-20191209230245 arm64 NVIDIA Wayland Package
ii nvidia-l4t-weston 32.3.1-20191209230245 arm64 NVIDIA Weston Package
ii nvidia-l4t-x11 32.3.1-20191209230245 arm64 NVIDIA X11 Package
ii nvidia-l4t-xusb-firmware 32.3.1-20191209230245 arm64 NVIDIA USB Firmware Package

dpkg -S /boot/initrd
nvidia-l4t-initrd: /boot/initrd

cat /proc/extlinux/extlinux.conf “does not exsist.”

cat /proc/cmdline
root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 video=tegrafb no_console_suspend=1 earlycon=uart8250,mmio32,0x3100000 nvdumper_reserved=0x1772e0000 gpt usbcore.old_scheme_first=1 tegraid=18.1.2.0.0 maxcpus=6 boot.slot_suffix= boot.ratchetvalues=0.2031647.1 bl_prof_dataptr=0x10000@0x175840000 sdhci_tegra.en_boot_part_access=1 quiet “Should be standard might of added the quiet”

System appears to crash after an hour or so while running my camera and gstreamer app.

The above is my mistake. It should be:
cat /boot/extlinux/extlinux.conf

Does it really say this from “cat /proc/cmdline”?

The “Should be standard might of added the quiet”, if it really is part of that command, is content which should not exist.

Btw, I’d recommend removing “quiet”. You’d remove this from “/boot/extlinux/extlinux.conf” in the “APPEND” key/value pair. This causes logging to be reduced, and there may be a lot going on which is missing.

Do you have a full boot log up through crash? Serial console from start to system failure would be very useful (more useful if “quiet” is not involved).

cat /boot/extlinux/extlinux.conf

TIMEOUT 30
DEFAULT primary

MENU TITLE L4T boot options

LABEL primary
MENU LABEL primary kernel
LINUX /boot/Image
INITRD /boot/initrd
APPEND ${cbootargs} quiet

When testing a custom kernel, it is recommended that you create a backup of

the original kernel and add a new entry to this file so that the device can

fallback to the original kernel. To do this:

1, Make a backup of the original kernel

sudo cp /boot/Image /boot/Image.backup

2, Copy your custom kernel into /boot/Image

3, Uncomment below menu setting lines for the original kernel

4, Reboot

LABEL backup

MENU LABEL backup kernel

LINUX /boot/Image.backup

INITRD /boot/initrd

APPEND ${cbootargs}

I added the comment at the end of the /proc/cmdline, trying to indicate that I only remember maybe adding the “quiet”

I will remove quiet from the extlinux.conf and reboot and try to get it to fail again,

Terry

PLEASE REMEMBER I did not have this problem until I did the apt update and apt upgrade, and apt dist-update either yesterday or the day before…

Its like something is upsetting bpmp now. I was having nvargus-daemon crashes but not as often as this bpmp problem. The nvargus-daemon crash did not fill syslog and cause a message about no disk space.

Terry

The APPEND entry of extlinux.conf seems suspicious. I would remove the quiet, but this isn’t why it is suspicious. The reason is that I would expect a couple of extra key words after “${cbootargs}”. Since I do not have a system running R32.3.1 I cannot actually say what should occur there, but I would guess something similar to this (but perhaps a different “root=”):

APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4

If more is needed, then perhaps extlinux.conf itself was not upgraded correctly. What is the timestamp on this file?
ls -l /boot/extlinux/extlinux.conf
(I forgot to mention, skip this if you manually edited since dist-upgrade…the timestamp would no longer have meaning)

This is not provided by a package so far as I know, but it would be interesting if this were suddenly provided by someone who is not NVIDIA. Do you see anything owning this file from packages (I’m guessing not)?
dpkg -S /boot/extlinux/extlinux.conf

I think some of the stuff comes from the device tree now.

So it just crashed while I was watching it and really the only thing on the serial console is the first entry in this post

[ 919.514467] bpmp_wait_ack() returned -110 (ch 22 mrq
[ 919.522081] ------------[ cut here ]------------
[ 919.526703] WARNING: CPU: 0 PID: 13684 at /home/tbuckley/tmp/seattle_tx2_kernel/sources/kern268
[ 919.542230] —[ end trace cedf2d7758f96369 ]—
[ 919.547994] bpmp: mrq 19 took 3996000 us
[ 919.553932] bpmp: mrq 19 took 3996000 us
[ 919.554577] bpmp: mrq 19 took 3996000 us

So everything on the console looks good during the boot process and loading, I do not see anything that is making me worried.

And yes my gstreamer app is now dead kind of like what happened before when nvargus-daemon would crash without any errors/indications. I could restart nvargus-daemon and my gstreamer app and it would start working again.

Now I have to sudo rm /var/log/syslog and reboot because of no disk space.

I changed /etc/logrotate.d/rsyslog and now at least this error does not use all my disk space.

So I restarted my gstreamer app, and it is displaying and doing what is required.

Terry

I do not know enough about the internal workings of bpmp to make a suggestion from that output, but I do believe this is a Cortex-R series, and is probably running as its own hard realtime system. This would be rather interesting if it was being hit with IRQs it could not handle in time. However, someone from NVIDIA can probably use that output and look up mrq 19 now and see what this actually links to.

Most of this does come from device tree, but this is what the “${cbootargs}” is for (device tree inheritance). So far as I know the default after this usually does include some rootfs content, but it seems out of place that what was appended was onlyquiet”. If you removed serial console, then that would be gone from the APPEND. Even if not technically an error, I am quite surprised that the rootfs setup is not part of the APPEND content (after “${cbootargs}”). This is unusual.

This is the newest stuff from dmesg during this crash

[ 919.514467] bpmp_wait_ack() returned -110 (ch 22 mrq 3 data <0x83 0x00 0x00 0x06>)
[ 919.522081] ------------[ cut here ]------------
[ 919.526703] WARNING: CPU: 0 PID: 13684 at /home/tbuckley/tmp/seattle_tx2_kernel/sources/kernel/nvidia/drivers/firmware/tegra/mail.c:323 bpmp_send_receive_atomic+0x248/0x268
[ 919.542062] Modules linked in: fuse zram bnep overlay 88x2bu(O) cfg80211 binfmt_misc btusb btrtl btbcm btintel spidev stalkerButton(O) nvgpu bluedroid_pm lt81x_mipi(O) ip_tables x_tables

[ 919.542115] CPU: 0 PID: 13684 Comm: NVMVidEncInputT Tainted: G O 4.9.140-tegra #86
[ 919.542118] Hardware name: lightning (DT)
[ 919.542122] task: ffffffc0682b8e00 task.stack: ffffffc06e484000
[ 919.542126] PC is at bpmp_send_receive_atomic+0x248/0x268
[ 919.542130] LR is at bpmp_send_receive_atomic+0x248/0x268
[ 919.542134] pc : [] lr : [] pstate: 004001c5
[ 919.542136] sp : ffffffc06e4878d0
[ 919.542139] x29: ffffffc06e4878d0 x28: 0000000000000004
[ 919.542146] x27: ffffffc06e487c18 x26: 0000000000000016
[ 919.542152] x25: ffffffc06e487c1c x24: 00000000000f4240
[ 919.542159] x23: 000000d51ac0077a x22: ffffff800a189c08
[ 919.542164] x21: ffffff8009fbb458 x20: 20c49ba5e353f7cf
[ 919.542170] x19: 0000000000000003 x18: 0000000000000010
[ 919.542176] x17: 0000000000000000 x16: 0000000000000000
[ 919.542182] x15: ffffffffffffffff x14: 3078302033387830
[ 919.542188] x13: 3c20617461642033 x12: 2071726d20323220
[ 919.542193] x11: 686328203031312d x10: 00000000000004f2
[ 919.542199] x9 : 722029286b63615f x8 : ffffff80083d47b0
[ 919.542205] x7 : ffffff8009e94358 x6 : ffffffc0f7053bf0
[ 919.542211] x5 : ffffffc0f7053bf0 x4 : 0000000000000000
[ 919.542217] x3 : ffffffc0f70597f8 x2 : ffffffc0f7053bf0
[ 919.542222] x1 : ffffffc0682b8e00 x0 : 0000000000000046

[ 919.542230] —[ end trace cedf2d7758f96369 ]—
[ 919.546843] Call trace:
[ 919.546849] [] bpmp_send_receive_atomic+0x248/0x268
[ 919.546854] [] tegra_bpmp_send_receive_atomic+0xc4/0x170
[ 919.546861] [] clk_bpmp_is_enabled+0x44/0x68
[ 919.546867] [] __clk_is_enabled+0x38/0x60
[ 919.546873] [] nvhost_module_get_rate+0x80/0xf0
[ 919.546880] [] nvhost_channelctl+0xa20/0xf28
[ 919.546885] [] do_vfs_ioctl+0xb0/0x8d8
[ 919.546890] [] SyS_ioctl+0x8c/0xa8
[ 919.546895] [] el0_svc_naked+0x34/0x38
[ 919.547994] bpmp: mrq 19 took 3996000 us
[ 919.553932] bpmp: mrq 19 took 3996000 us
[ 919.554577] bpmp: mrq 19 took 3996000 us

Hope this can help the nvidia person…

Hi Terry,

Is there an FAQ telling me what information is required in a post, I got the initial post from the serial console, and now after lunch my system is furbar’d. my gstreamer task is dead also, again it appears to have started after apt updata.

I think the basic concept here is to share steps to show us how reproduce issue on devkit.

Thus, what we need to know are:

  1. Is this on custom carrier board or nvidia devkit?
  2. What peripherals are connected on the board?
  3. What commands are needed to hit this issue?
  4. What applications are needed to hit this issue?
  5. Could you move to latest release and see if this issue is still?

Please do reply all the questions above.

  1. this is a custom carrier board.

  2. has a mipi dsi display and usb-c

  3. problem showed up during normal testing which is based on a gstreamer app to display, and create a pre-buffer for video. Based on a 8M luminera camera. This problem showed up after apt update/upgrade and changes for usb-c vbus. Today I will back out the usb-c vbus changes.

  4. No camera only has driver fro r32.3.1

  5. can I get some idea what the messages mean, they are cryptic.

Thanks,
Terry

I removed what I thought was the problem and it still failed, am attaching dmesg which I think has the info in it.

You need other info please ask.

Terry
crashDmesg.gz (29.0 KB)

Will you hit this issue if you revert the system back to the original one that does not have apt update/ upgrade?

I mean do you have a base system package that can always flash back to?

I was having problems with my video app stopping, it never gave any indication of why, now the dmesg is filled with messages when the system stops.

Yes I can backup but the product really needs to be on the most up to date release of software. I was directed by my manager to do the apt update/upgrade, as part of my testing, system checkout and system building. We had never seen any real problems in apt update/upgrade before and it has been at least 8 months of using r32.3.1.

Terry

So something changed and now the nvargus-daemon is telling us about why it is dying, how are we going to fix it. Also it seems to be using alot of cpu time to run my video app using only nvidia filters,

Thanks,
Terry

why go back it looks like there is some information about what is happening and I need the problem fixed, if I go back I will still have video hangs.

Terry