PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0010(Receiver ID)

Hello,

I’m getting PCIe error messages on my Jetson Nano running ubuntu 18.04. It is able to fully load the desktop environment,but at this point I see a lot of errors all of the same kind that have two bad consequences :

  1. the network connection drop in a forever loop
  2. ubuntu restarts after a couple of minutes.

I’ve attached the log file :

Log file :

https://pastebin.ubuntu.com/p/FCbT8xVbrM/

According with this post :

There are several methods to fix the bug. I’ve chosen to :

  1. append ‘pcie_aspm=off’ to the kernel command line like this :
TIMEOUT 30
DEFAULT primary

MENU TITLE L4T boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      INITRD /boot/initrd
      APPEND ${cbootargs} root=PARTUUID=5ac80d7c-40fb-4796-bd56-4110e389819b rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 pcie_aspm=off

LABEL backup
    MENU LABEL backup kernel
    LINUX /boot/Image
    INITRD /boot/initrd
    APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0
  1. and to execute the below command once the system worked for a couple of minutes before to restart as usually :

echo “performance” > /sys/module/pcie_aspm/parameters/policy

unfortunately these solutions haven’t fixed the error.

Not sure for your case, it may be a signal issue, so I’d first suggest to reseat the NVME SSD.
If this doesn’t work out you may also try NVME SSD drive visible in lspci, but not visible in fdisk - #3 by Honey_Patouceul.

Reset the NVME SSD ? I don’t use it. I boot the board via USB. I’m using the kiwibird sd card to USB adapter.

Not reset, I meant poweroff, unplug and replug the PCI device.

I don’t use any PCI device. I use only USB devices,a mouse,a keyboard,and the sd card to USB adapter.

The SD card maybe using PCIe so would USB adapter do, not sure, but you do have PCI errors. If re-seating doesn’t work out you may try the other suggestions.

I’ve recompiled the kernel,removing “CONFIG_PCIEASPM_POWERSAVE=y” and setting “CONFIG_PCIEASPM_PERFORMANCE=y” and I rebooted the board with the new kernel.

On the log I see the errors below,which appears in a forever loop :

[  251.544740] nvhdcp: Error: te launch operation failed with error -22
[  251.551178] nvhdcp: Error: Error getting srm signature!
[  251.556538] nvhdcp: Error: get vprime params failed
[  251.561515] nvhdcp: Error: nvhdcp failure - renegotiating in 1 second
[  252.582669] te_open_trusted_session:ERROR(-19) in tipc_create_channel
[  252.783248] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  252.790717] tsec 54500000.tsec: tsec_execute_method: submit failed
[  252.790717] 
[  252.798470] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  252.805863] tsec 54500000.tsec: tsec_execute_method: submit failed
[  252.805863] 
[  252.813585] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  252.820958] tsec 54500000.tsec: tsec_execute_method: submit failed
[  252.820958] 
[  252.828651] nvhdcp: Error: te launch operation failed with error -22
[  252.835055] nvhdcp: Error: Error getting srm signature!
[  252.840370] nvhdcp: Error: get vprime params failed
[  252.845299] nvhdcp: Error: nvhdcp failure - renegotiating in 1 second
[  253.862570] te_open_trusted_session:ERROR(-19) in tipc_create_channel
[  254.055334] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  254.062672] tsec 54500000.tsec: tsec_execute_method: submit failed
[  254.062672] 
[  254.070430] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  254.077793] tsec 54500000.tsec: tsec_execute_method: submit failed
[  254.077793] 
[  254.085547] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  254.092898] tsec 54500000.tsec: tsec_execute_method: submit failed
[  254.092898] 
[  254.100617] nvhdcp: Error: te launch operation failed with error -22
[  254.107107] nvhdcp: Error: Error getting srm signature!
[  254.112457] nvhdcp: Error: get vprime params failed
[  254.117360] nvhdcp: Error: nvhdcp failure - renegotiating in 1 second
[  255.142852] te_open_trusted_session:ERROR(-19) in tipc_create_channel
[  255.351505] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  255.359552] tsec 54500000.tsec: tsec_execute_method: submit failed
[  255.359552] 
[  255.369825] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  255.378139] tsec 54500000.tsec: tsec_execute_method: submit failed
[  255.378139] 
[  255.387036] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  255.396217] tsec 54500000.tsec: tsec_execute_method: submit failed
[  255.396217] 
[  255.405388] nvhdcp: Error: te launch operation failed with error -22
[  255.412015] nvhdcp: Error: Error getting srm signature!
[  255.417829] nvhdcp: Error: get vprime params failed
[  255.423467] nvhdcp: Error: nvhdcp failure - renegotiating in 1 second
[  256.454556] te_open_trusted_session:ERROR(-19) in tipc_create_channel
[  256.644429] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  256.652003] tsec 54500000.tsec: tsec_execute_method: submit failed
[  256.652003] 
[  256.660992] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  256.668327] tsec 54500000.tsec: tsec_execute_method: submit failed
[  256.668327] 
[  256.676152] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  256.683474] tsec 54500000.tsec: tsec_execute_method: submit failed
[  256.683474] 
[  256.691255] nvhdcp: Error: te launch operation failed with error -22
[  256.697631] nvhdcp: Error: Error getting srm signature!
[  256.702923] nvhdcp: Error: get vprime params failed
[  256.707908] nvhdcp: Error: nvhdcp failure - renegotiating in 1 second
[  257.734618] te_open_trusted_session:ERROR(-19) in tipc_create_channel
[  257.923270] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  257.930606] tsec 54500000.tsec: tsec_execute_method: submit failed
[  257.930606] 
[  257.938413] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  257.945747] tsec 54500000.tsec: tsec_execute_method: submit failed
[  257.945747] 
[  257.953529] tsec 54500000.tsec: nvhost_module_busy: failed to power on, err -22
[  257.960935] tsec 54500000.tsec: tsec_execute_method: submit failed
[  257.960935] 
[  257.968630] nvhdcp: Error: te launch operation failed with error -22
[  257.975119] nvhdcp: Error: Error getting srm signature!
[  257.980438] nvhdcp: Error: get vprime params failed
[  257.985346] nvhdcp: Error: nvhdcp failure - renegotiating in 1 second

anyway,it booted until the full load of the desktop enviroment. What you suggest me to do ?

Hmm, looks like a different issue here. Not sure that Jetson support HDCP.
Someone else may provide better advice.

I also don’t know if HDCP is supported or not. I do wonder though, there is a definite PCIe error. You don’t have to have added devices of your own for this to occur since there can be internal PCIe devices (e.g., a bridge, even if not used). You say you are not using any PCIe devices. Are any even plugged in?

Also, I noticed complaints about extended attributes:

[   17.034748] squashfs: SQUASHFS error: Xattrs in filesystem, these will be ignored
[   17.044275] squashfs: SQUASHFS error: unable to read xattr id index table

Was this system customized in any way? Is it purely the dev kit from NVIDIA with the SD card mounted on the module and not on the carrier board? It just seems strange to see this if something about the install or kernel wasn’t “average”.

Also, you stated this kernel modification (and I suspect the kernel configuration might be an issue for at least some of this):

removing “CONFIG_PCIEASPM_POWERSAVE=y” and setting “CONFIG_PCIEASPM_PERFORMANCE=y”

Did you set “CONFIG_PCIEASPM_POWERSAVE=n”? This seems correct, but that’s different from just removing the config. When you made those modifications, did you use a kernel configuration editor, e.g., “make menuconfig” (or similar)? If edited directly with an editor the results won’t be good.

Finally, what do you see from “uname -r”?

—> Are any even plugged in?

I don’t think. The devices attached to the jetson nano are mouse,keyboard,hdmi cable,kiwibird sdcard to usb adapter. I can’t use the sd card directly anymore. It boots 1 time on 10 tries. With the usb adapter it boots more frequently.

—> Was this system customized in any way?

I’ve installed ubuntu from the prebuilt image provided bv nvidia. I have enabled kvm by downloading the source code from here :

git clone https://github.com/OE4T/linux-tegra-4.9.git + oe4t-patches-l4t-r32.7.3

—> Did you set “CONFIG_PCIEASPM_POWERSAVE=n

I didn’t know that =n was supported. I don’t see any “=n” on the kernel config file.

—> When you made those modifications, did you use a kernel configuration editor, e.g., “make menuconfig” (or similar)?

yes,make menuconfig. If I look within the kernel config file after having edited it with make menuconfig,I don’t see any “=n” parameter.

# uname -r
4.9.299+

Hi,

It sounds like hardware problem if PCIe AER error happened when no PCIe device is connected.

BTW, the on-module ethernet port is based on PCIe. That could be your only pcie device there.

Please reflash your module with default jetpack setting + devkit. If PCIe issue is still there, then this module needs RMA.

Are you sure that’s an hardware problem ?

those errors don’t appears if I don’t enable the parameter : CONFIG_PCIEASPM_PERFORMANCE=y

Sorry that I actually didn’t follow every comment of yours.

If enabling ASPM will cause issue, then just do not enable it. This is common in PCIe world.

Or you can just give a brief summary about what you want to ask here? I still don’t get your question here.
Why adding some configuration to crash your device and then come back to ask?

I’ve changed that parameter to fix another error,that’s my monitor that after sometime it shuts down and never comes back. And even when I turn it off,it does not turn on until I reboot the jetson nano. I was playing with the energy saving parameters.

I would suggest you can just file a new topic and use serial console log to check what got print for below issue. Tag me there is you need someone to analyze the log.

,that’s my monitor that after sometime it shuts down and never comes back

No need to create another unrelated PCIe topic.

PCI “AER” is “Advanced Error Reporting”. Some hardware has a chance to partially or fully recover if that is enabled, and it also helps in debugging. This won’t show up because a feature is added (well, turning off reporting and then turning it on would change seeing the messages), but this is rarely anything but either (A) hardware issues (some temporary), or (B) incorrect drivers. The latter is unlikely, but not impossible.

Do understand that the image on the SD card the Linux o/s, but it is not the boot content. The boot content itself must be of a release matching the SD card (some releases match a range, but quite often if you change an SD card or create one for the first time, and it fails, then it is an issue of that boot stage software which is not on the SD card). Flashing with JetPack/SDK Manager flashes the module itself (the QSPI memory in it), which is what flashes that boot content. You’ve updated the SD card, and it doesn’t work, so flashing the module itself is a very good initial step to fix. Certainly you can’t know hardware or software until both module and SD card have correct releases.

Note that JetPack/SDKM is a front end to flash software running on the host PC, whereas L4T (Ubuntu plus NVIDIA drivers) is what actually gets flashed. The two releases are tied together, so if you know one, then the other is also determined. You can see the L4T release with “head -n 1 /etc/nv_tegra_release”, which could also be found on the SD card looking at it from a host PC. You can see instructions for a specific release by going to the specific L4T release. A list of releases are available for each here (Nano can use 4.x JetPack or 32.x series L4T):

Note: We can talk more about kernel configuration once we know about the module being flashed with correct release.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.