Question on Jetson PCIe hot-plug & surprise-down support

StefanR · March 28, 2023, 3:18pm

Hi,

we are currently thinking about moving to NVIDIA Jetson to replace the currently used ARM64 SoC. One problem that we are facing with this current ARM64 SoC is, that when trying to access memory on the PCIe bus that is not available, e.g. since this device was removed via hotplug / surprise down, the system reacts with this ugly kernel crash:

Internal error: synchronous external abort: 0000000096000010 [#1] PREEMPT SMP

On x86 systems and other system I worked before, accessing such PCI addresses (or config space) results in reading 0xffff’s.

Could you perhaps tell me how this is handled in the NVIDIA Jetson setup? Currently v5.10 LTS is used AFAIK. Does the kernel here also generate a crash on such accesses to unavailable PCIe memory?

Additionally it would be very interesting if and how PCIe hot-plugging is supported in the NVIDIA downstream kernel version for Jetson. I’ve seen a patchset from your college Vidya Sagar introducing a GPIO based PCIe hotplug support in the linux-pci list a few months ago. Is this what’s being used in the NVIDIA kernel version?

Any infos on this would be very welcome.

Thanks in advance,
Stefan

WayneWWW · March 29, 2023, 4:40am

hello,

For your question here:

Could you perhaps tell me how this is handled in the NVIDIA Jetson setup? Currently v5.10 LTS is used AFAIK. Does the kernel here also generate a crash on such accesses to unavailable PCIe memory?

Tegra downstream driver remove complete hierarchy during hot unplug, so there will no memory for SW to access. So, we will not see this error with Jetpack 5.10 kernel.

Additionally it would be very interesting if and how PCIe hot-plugging is supported in the NVIDIA downstream kernel version for Jetson. I’ve seen a patchset from your college Vidya Sagar introducing a GPIO based PCIe hotplug support in the linux-pci list a few months ago. Is this what’s being used in the NVIDIA kernel version?

No, in downstream kernel we are registering hot plug interrupt with a gpio. Upstream patches are not approved, so we have re-engineer them and send again.

To enable hot plug, platform should connect a gpio to either PRSNT1#/PRSNT2# pin at PCIe slot.

In SW, set the pin as input gpio in pinmux bct and add nvidia,pex-prsnt-gpios in kernel DTS.
Documentation: kernel/kernel-5.10/Documentation/devicetree/bindings/pci/nvidia,tegra194-pcie.txt

StefanR · March 29, 2023, 6:15am

Hello,
thanks for the quick reply. I do have an additional question though to the problem with accessing unavailable PCIe devices after unplugging. IIUTC, unplugging a PCIe device directly connected to a PCIe RP of the NVIDIA SoC will remove the complete hierarchy of this PCIe RP. But this is very likely not the case when the PCIe RP is connected to a PCIe switch with e.g. 4 downstream ports. And the hot unplugging happens on one of these PCIe switch downstream ports. I assume that in this case the memory mapping will still be valid, as seen from the NVIDIA SoC. What happens when the CPU tries to access the memory of this now unplugged PCIe device behind the PCIe switch?

Thanks,
Stefan

system · May 2, 2023, 8:24am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.