we are currently thinking about moving to NVIDIA Jetson to replace the currently used ARM64 SoC. One problem that we are facing with this current ARM64 SoC is, that when trying to access memory on the PCIe bus that is not available, e.g. since this device was removed via hotplug / surprise down, the system reacts with this ugly kernel crash:
On x86 systems and other system I worked before, accessing such PCI addresses (or config space) results in reading 0xffff’s.
Could you perhaps tell me how this is handled in the NVIDIA Jetson setup? Currently v5.10 LTS is used AFAIK. Does the kernel here also generate a crash on such accesses to unavailable PCIe memory?
Additionally it would be very interesting if and how PCIe hot-plugging is supported in the NVIDIA downstream kernel version for Jetson. I’ve seen a patchset from your college Vidya Sagar introducing a GPIO based PCIe hotplug support in the linux-pci list a few months ago. Is this what’s being used in the NVIDIA kernel version?
Could you perhaps tell me how this is handled in the NVIDIA Jetson setup? Currently v5.10 LTS is used AFAIK. Does the kernel here also generate a crash on such accesses to unavailable PCIe memory?
Tegra downstream driver remove complete hierarchy during hot unplug, so there will no memory for SW to access. So, we will not see this error with Jetpack 5.10 kernel.
Additionally it would be very interesting if and how PCIe hot-plugging is supported in the NVIDIA downstream kernel version for Jetson. I’ve seen a patchset from your college Vidya Sagar introducing a GPIO based PCIe hotplug support in the linux-pci list a few months ago. Is this what’s being used in the NVIDIA kernel version?
No, in downstream kernel we are registering hot plug interrupt with a gpio. Upstream patches are not approved, so we have re-engineer them and send again.
To enable hot plug, platform should connect a gpio to either PRSNT1#/PRSNT2# pin at PCIe slot.
In SW, set the pin as input gpio in pinmux bct and add nvidia,pex-prsnt-gpios in kernel DTS.
Documentation: kernel/kernel-5.10/Documentation/devicetree/bindings/pci/nvidia,tegra194-pcie.txt
Hello,
thanks for the quick reply. I do have an additional question though to the problem with accessing unavailable PCIe devices after unplugging. IIUTC, unplugging a PCIe device directly connected to a PCIe RP of the NVIDIA SoC will remove the complete hierarchy of this PCIe RP. But this is very likely not the case when the PCIe RP is connected to a PCIe switch with e.g. 4 downstream ports. And the hot unplugging happens on one of these PCIe switch downstream ports. I assume that in this case the memory mapping will still be valid, as seen from the NVIDIA SoC. What happens when the CPU tries to access the memory of this now unplugged PCIe device behind the PCIe switch?