[MGBE] 10G Network occasional TX crc errors

Hello,

We have a problems with CRC errors while using MGBE network on our custom carrier board.

We use Marvell 88X3310 10G optical PHY which we utilize up to around 7Gbps. We run Jetpack 5.1.4.

On the other (PC) side we have Broadcom BCM57502 NIC and Intel E10GSFPSRX SFP.

The problem is that occasionally we see a large amount of packet drops on the Host’s NIC side due to crc mismatch, we see rapid and steady increase of rx_fcs_err_frames (ethtool -S ethX) counter on the PC side. It happens randomly once in a while and persists until link-up/link-down cycle. Most of the workload is the RTSP over UDP and these errors give us tons of RTSP retransmits.

We have tried the solution from this thread and it doesn’t seem to help: Reduced bandwidth at 10Gbps on Orin and mgbe_payload_cs_err correlation - #5 by waterbear

We have also checked various metrcis on jetson side (PHY and MAC countes) and tried to disable offloadings on MGBE and nothing helps us much.

Have you guys ever seen such problem? Could you point us in the right direction?

Here is our device-tree configuration for mgbe controller:

	/* 10G */
	ethernet@6810000 {
		status = "okay";
		phy-handle = <&phy0>;
		phy-mode = "10gbase-r";
		/* 0:XFI 10G, 1:XFI 5G, 2:USXGMII 10G, 3:USXGMII 5G */
		nvidia,phy-iface-mode = <0>;
		nvidia,phy-reset-gpio = <&tegra_main_gpio PHY0_RESET GPIO_ACTIVE_HIGH>;
		nvidia,skip_mac_reset = <1>;
		nvidia,uphy-gbe-mode = <1>;
		nvidia,macsec-enable = <0>;
		nvidia,mac-addr-idx = <1>;
		compatible = "nvidia,nvmgbe";
		dma-coherent;
		nvidia,dcs-enable = <0x01>;
		nvidia,rx_riwt = <0x200>;
		nvidia,rx_frames = <0x40>;
		nvidia,tx_usecs = <0x100>;
		nvidia,tx_frames = <0x10>;
		nvidia,promisc_mode = <0x01>;
		nvidia,max-platform-mtu = <0x3fff>;
		nvidia,pause_frames = <0x01>;
		nvidia,mdio_addr = <0>;
		mdio {
			compatible = "nvidia,eqos-mdio";
			#address-cells = <1>;
			#size-cells = <0>;

			phy0: ethernet_phy@0 {
				compatible = "ethernet-phy-ieee802.3-c45";
				reg = <0>;
				nvidia,phy-rst-pdelay-msec = <0x12c>;
				nvidia,phy-rst-duration-usec = <0x35f48>;
				interrupt-parent = <&tegra_main_gpio>;
				interrupts = <PHY0_INT IRQ_TYPE_LEVEL_LOW>;
			};
		};
	};

Want to clarify what kind of test is possible to do on your side.

For example, is it possible to disconnect the from Broadcom BCM57502 NIC and try other kind of connection too and see if you would see similar behavior.

Also, other test items like if NV devkit also hits similar issue when doing the same connection.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.