In theory the problem is corrected, but it isn’t really possible to know based on what is presented.
For background, many PCIe devices have an optional “Advanced Error Correction”, and this allows not only detecting various error types, but often fixing those errors. The nature of the error depends on the specific error, and although often the problem is one of signal quality, it also is not unusual for this to be related to a software issue, e.g., a mismatched driver or argument passed to the driver.
Note that the particular PCIe device itself defines much of this. Yours is apparently at slot 0001:00:00.0. Normally one would use lspci to find out more information. Some information on this:
lspci is a brief view of all known PCIe devices. Jetsons don’t have a lot, but it might list PCIe bridges for example in addition to the device itself.
One has to use sudo to find the most verbose format of lspci.
To view only the slot your error message is about, and to simultaneously create a log file you can attach to the forum: sudo lspci -s '0001:00:00.0' -vvv 2>&1 | tee log_lspci.txt
With that you could see verbose information about the specific device, and then attach a copy to the forum. More information would probably be available then.
We would also need to know the exact model of Jetson. This includes whether there is a custom or third party carrier board involved, or if this is purely a developer’s kit. I suggest adding this information:
cat /etc/nv_boot_control.conf
head -n 1 /etc/nv_tegra_release
Have there been any device tree modifications, and if so what?
If this is a PCIe device you installed, add details what the device is; if not, then specify you don’t have any optional PCIe hardware (including m.2 slot).
You’ve booted to an external device (nvme0n1p1) using a mainline kernel (L4T R36.x uses mainline).
The carrier board is flashed as a dev kit.
Can you verify that the hardware itself is in fact truly a developer’s kit (probably it is, but this needs to be asked)? Sometimes third party carrier boards end up on an NVIDIA carrier board, which works, but it is important to know that the carrier board is in fact what the software is designed to work with during debugging.
The device at slot ‘0001:00:00.0’ is part of NVIDIA’s devices. It is a PCIe bridge. This means that the device and the device attached to the bridge need to be considered together. For that it would be useful to have a tree view of lspci:
lspci -tv
With logging for a file you can attach to the forum: lspci -tv 2>&1 | tee log_pci_tree.txt
You can provide the tree view of lspci now, but we will need the verbose lspci on that specific slot after some errors have occurred. Assuming the dmesg logs show the same PCIe slot (the bridge) of ‘0001:00:00.0’, then whenever you find the next error: sudo lspci -s ‘0001:00:00.0’ -vvv 2>&1 | tee log_pcie_error.txt
(then attach log_pcie_error.txt)
If you post the tree view of lspci now, then we can figure out what slots the bridge might be serving. When the error occurs on the bridge it is possible that we might be interested in knowing what device that bridge serves and getting a verbose lspci on the device being served even if that device is not itself showing an error. PCIe devices do often have sub-devices though, and so the tree view slot naming might need an explanation when describing what the slot is that the bridge serves. We can get that knowledge out ahead of time and then see if there are downstream errors as well as bridge errors.
If it turns out that the device being served by the bridge is the NVMe, then we might ask more questions about the NVMe, but don’t bother for now.