Mc-err: (255) csr_vicsrd: EMEM address decode error

Hi NV team,

Our self-driving vehicles have been running for several days,
encountered the following occasional kernel panic problem.
Please help analyze it.

[   83.896591] bwmgr API not supported
[   87.232821] bwmgr API not supported
[   87.295281] bwmgr API not supported
[   90.639344] bwmgr API not supported
[   90.695657] bwmgr API not supported
[   94.046704] bwmgr API not supported
[   94.096283] bwmgr API not supported
[  131.644299] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
[  136.649090] Bridge firewalling registered
[  212.589715] falcon 154c0000.nvenc: Direct firmware load for nvhost_nvenc080.fw failed with error -2
[  212.599338] falcon 154c0000.nvenc: Falling back to sysfs fallback for: nvhost_nvenc080.fw
[  212.609822] falcon 154c0000.nvenc: looking for firmware in subdirectory
[  213.861137] nvmap_alloc_handle: PID 19320: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 
[  213.886040] bwmgr API not supported
[  213.898227] bwmgr API not supported
[  213.903984] bwmgr API not supported
ÿÿÿÿ[18900.608633] bwmgr API not supported
[18900.615563] bwmgr API not supported
[18900.623313] bwmgr API not supported
[19051.824081] nvmap_alloc_handle: PID 2205493: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 
[19051.838195] bwmgr API not supported
[19051.855313] bwmgr API not supported
[19051.864730] bwmgr API not supported
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ[90289.034251] bwmgr API not supported
[90289.042360] bwmgr API not supported
[90289.051196] bwmgr API not supported
[90441.105111] nvmap_alloc_handle: PID 3090044: gst-launch-1.0: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 
[90441.123203] bwmgr API not supported
[90441.136496] bwmgr API not supported
[90441.146238] bwmgr API not supported
ÿÿ[106158.357389] iommu_context_dev 13e40000.host1x:niso1_ctx4: pin_array_ids: could not get buf err=-22
[106158.367011] falcon 15340000.vic: nvhost_ioctl_channel_submit: failed with err -22
[106158.564956] arm-smmu 8000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x7ffbf00e00, fsynr=0x760003, cbfrsynra=0x39, cb=12
[106158.565010] arm-smmu 8000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x7ffbf01d00, fsynr=0x330003, cbfrsynra=0x39, cb=12
[106158.577587] arm-smmu 8000000.iommu: Unhandled context fault: fsr=0x80000402, iova=0x7ffbf01d00, fsynr=0x330003, cbfrsynra=0x39, cb=12
[106158.594019] mc-err: (255) csr_vicsrd: EMEM address decode error
[106158.608920] mc-err:   status = 0x2006406c; hi_addr_reg = 0x000000ff addr = 0xffffffff00
[106158.617378] mc-err:   secure: yes, access-type: read
[106158.622641] Unable to handle kernel paging request at virtual address ffff800011210002
[106158.630972] Mem abort info:
[106158.633962]   ESR = 0x96000021
[106158.637229]   EC = 0x25: DABT (current EL), IL = 32 bits
[106158.642840]   SET = 0, FnV = 0
[106158.646102]   EA = 0, S1PTW = 0
[106158.649500] Data abort info:
[106158.652590]   ISV = 0, ISS = 0x00000021
[106158.656663]   CM = 0, WnR = 0
[106158.659842] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000023f752000
[106158.666888] [ffff800011210002] pgd=00000001000b3003, p4d=00000001000b3003, pud=00000001000b4003, pmd=000000010129d003, pte=0068000002bb0f17
[106158.680005] Internal error: Oops: 96000021 [#1] PREEMPT SMP
[106158.685875] Modules linked in: xt_multiport ip6table_filter ip6_tables veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter br_netfilter fuse overlay lzo_rle lzo_compress zram ramoops reed_solomon 8021q loop garp mrp bonding sch_fq_codel nvgpu aes_ce_blk crypto_simd snd_soc_tegra186_asrc snd_soc_tegra186_dspk cryptd snd_soc_tegra210_ope snd_soc_tegra210_iqc snd_soc_tegra186_arad snd_soc_tegra210_mvc snd_soc_tegra210_afc snd_hda_codec_hdmi aes_ce_cipher snd_soc_tegra210_dmic snd_soc_tegra210_adx snd_soc_tegra210_amx snd_soc_tegra210_mixer ghash_ce nvidia_drm(OE) snd_soc_tegra210_i2s snd_soc_tegra210_admaif sha2_ce snd_soc_tegra210_sfc nvidia_modeset(OE) snd_soc_tegra_pcm snd_soc_tegra210_adsp ch9344 binfmt_misc snd_hda_tegra mcp25xxfd_can snd_soc_tegra_machine_driver sha256_arm64 spidev sha1_ce snd_soc_tegra_utils snd_hda_codec nvadsp snd_soc_simple_card_utils
[106158.685945]  pwm_fan snd_soc_spdif_tx tegra_bpmp_thermal igc(E) snd_hda_core snd_soc_rt5640 snd_soc_tegra210_ahub userspace_alert nct1008 ina3221 tegra210_adma nvidia(OE) snd_soc_rl6231 miivii_eeprom spi_tegra114 nvmap mttcan can_dev can_raw can ip_tables x_tables [last unloaded: mtd]
[106158.805094] CPU: 0 PID: 57 Comm: irq/27-mc_statu Tainted: G        W  OE     5.10.104-tegra #1
[106158.814135] Hardware name: Unknown Jetson AGX Orin/Jetson AGX Orin, BIOS 3.1-32827747 03/19/2023
[106158.823349] pstate: 00c00009 (nzcv daif +PAN +UAO -TCO BTYPE=--)
[106158.829680] pc : log_fault+0x98/0x640
[106158.833575] lr : log_fault+0x4c/0x640
[106158.837466] sp : ffff80001110bc90
[106158.840991] x29: ffff80001110bca0 x28: ffff7a83803f3a00 
[106158.846592] x27: ffff7a83803f3a00 x26: ffffd45585942480 
[106158.852196] x25: 0000000000000001 x24: ffffd45585942560 
[106158.857795] x23: ffff800011210000 x22: ffff800011210002 
[106158.863397] x21: ffffd4558799d000 x20: 0000000000000007 
[106158.868995] x19: ffffd45586bf1d98 x18: 0000000000000010 
[106158.874597] x17: 0000000000000000 x16: ffffd45585853210 
[106158.880197] x15: ffff7a83803f3f70 x14: ffffffffffffffff 
[106158.885796] x13: ffff80009110b977 x12: ffff80001110b980 
[106158.891397] x11: 000000000000000b x10: 0101010101010101 
[106158.896998] x9 : fffffffffffffffb x8 : 7f7f7f7f7f7f7f7f 
[106158.902598] x7 : fefefeff646c606d x6 : 00170401e9e1acf4 
[106158.908202] x5 : 742c616901041700 x4 : 8080808000000000 
[106158.913803] x3 : b34b234b0963a000 x2 : ffffd455859ee170 
[106158.919401] x1 : 0000000000000010 x0 : ffffd4558799d110 
[106158.925002] Call trace:
[106158.927633]  log_fault+0x98/0x640
[106158.931159]  log_mcerr_fault+0x2dc/0x620
[106158.935320]  tegra_mcerr_thread+0xd4/0x120
[106158.939656]  irq_thread_fn+0x34/0xa0
[106158.943453]  irq_thread+0x158/0x250
[106158.947162]  kthread+0x148/0x170
[106158.950600]  ret_from_fork+0x10/0x24
[106158.954397] Code: f0009980 91044000 f874d817 8b1602f6 (b94002d7) 
[106158.960817] ---[ end trace 695d6c0a12022ecb ]---
[106158.970763] Kernel panic - not syncing: Oops: Fatal exception
[106158.976824] SMP: stopping secondary CPUs
[106158.980992] Kernel Offset: 0x545575830000 from 0xffff800010000000
[106158.987402] PHYS_OFFSET: 0xffff857d80000000
[106158.991837] CPU features: 0x0040006,4a80aa38
[106158.996356] Memory Limit: none
[106159.004622] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

\00\00ÿâ
[0000.061] I> MB1 (version: 0.32.0.0-t234-54845784-57325615)
[0000.067] I> t234-A01-0-Silicon (0x12347) Prod
[0000.071] I> Boot-mode : Coldboot
[0000.074] I> Emulation: 
[0000.076] I> Entry timestamp: 0x00000000
[0000.080] I> last_boot_error: 0x0
[0000.083] I> BR-BCT: preprod_dev_sign: 0
[0000.087] I> rst_source: 0x2, rst_level: 0x1
[0000.091] I> Task: Bootchain select WAR set (0x5000ba65)
[0000.096] I> Task: Enable SLCG (0x5000bab1)
[0000.100] I> Task: CRC check (0x5001ea19)
[0000.104] I> Skip FUSE records CRC check as records_integrity fuse is not burned
[0000.111] I> Task: Initialize MB2 params (0x5000cb51) 

Platform:Orin
Release:R35.3.1

UART log (Please search for the keyword “panic”):
2024-06-27_01-07-03.txt (195.0 KB)

Thanks!

What’s case to run?
Does encoding and file writing?

Hi Shane,

There is the “file writing” action because the system log file needs to be saved continuously.
And there is also the “encoding” action in GST pipeline.

Hi Shane,

Part of the pipepile:
INPUT_NUMBER=0
gst-launch-1.0 --no-fault --gst-debug-level=2
nvv4l2camerasrc device=/dev/video${INPUT_NUMBER} do-timestamp=TRUE !
videorate ! video/x-raw(memory:NVMM),framerate=5/1 ! timemark !
nvvidconv ! video/x-raw(memory:NVMM), width=1920, height=1536 !
nvvidconv left=0 right=1919 top=228 bottom=1307 !
video/x-raw(memory:NVMM), width=1920, height=1080 ! nvvidconv !
video/x-raw(memory:NVMM), width=960, height=540, format=(string)I420 !
tee name=t_raw_camera_nv_${INPUT_NUMBER} ! nvvidconv !
video/x-raw(memory:NVMM), width=960, height=540, format=(string)BGRx !
tee name=t_raw_video_${INPUT_NUMBER}
t_raw_camera_nv_${INPUT_NUMBER}. ! queue ! nvv4l2h265enc iframeinterval=10 idrinterval=30 insert-sps-pps=true !
h265parse ! filesink location=/debug/file/camera.h265 \

Do you check the storage free space?

Hi Shane,

The system log files will be saved in the EMMC.
When the problem occurs, there is 22GB of free space.
Other data will be stored on the SSD, and there will be a service program to clean it up immediately.

If the problem recurs, I will focus on the SSD space usage.

Do you need to provide other log files?
Or other troubleshooting methods?

Hi,
Please try with videotestsrc is-live=1 and run on developer kit. See if you can replicate the issue error. If yes, please share us the command so that we can set up and try.

And please upgrade to Jetpack 5.1.3 or 6.0GA if possible.

Hi DaneLLL,

Run the following pipeline?
" gst-launch-1.0 videotestsrc is-live=1 pattern=snow ! xvimagesink "

Hi,
Please replace nvv4l2camerasrc with videotestsrc is-live=1 in the original pipeline and give it a try.

Hi DaneLLL,

I would like to emphasize here that the GST pipeline of our equipment is very complex, and the pipeline I wrote above is only a part of it.

" replace nvv4l2camerasrc with videotestsrc is-live=1 in the original pipeline and give it a try."
The problem was not reproduced after 4 hours of testing,

BUT,
Our device reproduced this problem. Please help analyze the following new log.
2024-07-14_01-07-55.txt (193.6 KB)

Thank you.

Hi,
It looks to be an issue in camera sensor driver. There may be invalid memory access. Please run v4l2-ctl command in long run and see if the error is present.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.