How to reset display interface for kdump

I’m trying to capture kernel crashes on the Orin NX. I’ve followed the instructions here: Kernel Debugging Tools

When I test the configuration the kexec does not completely boot if the nvidia_drm module is loaded. I get memory controller errors which prevent the successful completion of the initramfs boot:

[   15.305391] arm_smmu_global_fault: 1002413 callbacks suppressed
[   15.305401] arm-smmu 10000000.iommu: Blocked unknown Stream ID 0x1; boot with “arm-smmu.disable_bypass=0” to allow, but this may have security implications
[   15.305404] arm-smmu 10000000.iommu:         GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000001, GFSYNR2 0x00000000
[   15.305410] tegra30_mc_handle_irq: 641856 callbacks suppressed
[   15.305412] tegra-mc 2c00000.memory-controller: nvdisplayr1: secure read @0x000000ffffffff00: EMEM address decode error (EMEM decode error)
[   15.305434] arm-smmu 10000000.iommu: Blocked unknown Stream ID 0xc01; boot with “arm-smmu.disable_bypass=0” to allow, but this may have security implications

If I unload the nvidia_drm modules the kexec kernel boots ok.

The problem starts with the iommu configuration earlier in the boot log:

[    0.295062] arm-smmu 10000000.iommu: SMMUv2 with:
[    0.295064] arm-smmu 10000000.iommu:         stage 1 translation
[    0.295065] arm-smmu 10000000.iommu:         stage 2 translation
[    0.295066] arm-smmu 10000000.iommu:         nested translation
[    0.295068] arm-smmu 10000000.iommu:         stream matching with 128 register groups
[    0.295070] arm-smmu 10000000.iommu:         128 context banks (0 stage-2 only)
[    0.295073] arm-smmu 10000000.iommu:         Supported page sizes: 0x61311000
[    0.295075] arm-smmu 10000000.iommu:         Stage-1: 48-bit VA → 48-bit IPA
[    0.295076] arm-smmu 10000000.iommu:         Stage-2: 48-bit IPA → 48-bit PA
[    0.295121] arm-smmu 10000000.iommu: Unexpected global fault, this could be serious
[    0.295123] arm-smmu 10000000.iommu:         GFSR 0x80000001, GFSYNR0 0x00000000, GFSYNR1 0x00000801, GFSYNR2 0x00000000

What do I need to do to solve this? Reconfigure the memory map for the crash kernel? Use a kernel parameter to reset the display hw? Something else?

Configuration is Jetpack 6.1/L4T R36.4.3

*** Please note that this reply is generated by LLM automatically ***

To solve the issue of kernel crashes on the Orin NX, specifically with the nvidiadrm module loaded, we need to analyze the problem step by step.

Firstly, the fact that the kexec kernel boots successfully when the nvidiadrm modules are unloaded suggests that the issue is related to the display hardware or the iommu configuration.

The iommu (Input-Output Memory Management Unit) is a hardware component that manages the memory access of devices, ensuring that they don’t access memory regions they shouldn’t. The iommu configuration is critical for the proper functioning of the system.

In this case, the problem starts with the iommu configuration earlier in the boot log, which indicates that there might be an issue with the memory mapping or the iommu settings.

Reconfiguring the memory map for the crash kernel (option 1) might be a possible solution. This would involve updating the memory layout to ensure that the crash kernel has access to the necessary memory regions.

Using a kernel parameter to reset the display hardware (option 2) is another possible solution. This would involve passing a kernel parameter to reset the display hardware, which might resolve the issue with the nvidiadrm module.

However, without more information about the specific error messages or the boot log, it’s difficult to determine the exact cause of the problem.

Therefore, I would recommend trying both options and analyzing the results.

Option 1: Reconfigure the memory map for the crash kernel

  • Update the memory layout to ensure that the crash kernel has access to the necessary memory regions.
  • Verify that the memory map is correctly configured for the crash kernel.

Option 2: Use a kernel parameter to reset the display hardware

  • Pass a kernel parameter to reset the display hardware.
  • Verify that the display hardware is correctly reset and that the nvidiadrm module is loaded successfully.

If neither of these options resolves the issue, further analysis of the boot log and error messages would be necessary to determine the root cause of the problem.

In summary, the solution involves trying both options and analyzing the results to determine the best course of action.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

This answer simply reformats my original post. Not useful

Hi

Are you seeing this issue even without display panel connected?

While we check it internally, can you unblock yourself with blacklisting the modules which is causing kexec boot. We may not need display in kexec boot.

kexec -l /boot/vmlinuz- \

  --initrd=/boot/initramfs-<version>.img \\

  --command-line="root=/dev/sda1 ro quiet **modprobe.blacklist=module1,module2**"

There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks
~1105