nvethernet:Meaning of nveth Error Code?and why Failed to report error?

Meaning of nveth Error Code

We encountered an issue while testing the nveth network card on OrinX:

Mar 04 05:56:22 orin kernel: nvethernet 6800000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1002, return: -19
Mar 04 05:56:28 orin kernel: nvethernet 6800000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1002, return: -19
Mar 04 05:57:27 orin kernel: nvethernet 6800000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1002, return: -19
Mar 04 05:57:30 orin kernel: nvethernet 6800000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1002, return: -19
Mar 04 05:58:19 orin kernel: nvethernet 6800000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1002, return: -19
Mar 04 05:58:56 orin kernel: nvethernet 6800000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1002, return: -19
Mar 04 05:59:26 orin kernel: nvethernet 6800000.ethernet: Failed to report error: reporter ID: 0x0, Error code: 0x1002, return: -19

We have identified that the error code corresponds to OSI_TX_FRAME_ERR. Could you clarify what specific issue this error indicates? Also, what does “Failed to report error” mean in this context?

Additionally, we found the definition of error code 0x1002 in the following file:

drivers/net/ethernet/nvidia/nvethernet/nvethernetrm/osi/osi_core.h

 * @addtogroup HSI_SW_ERR_CODE
 *
 * @brief software defined error code
 * @{
 */
#define OSI_INBOUND_BUS_CRC_ERR     0x1001U
#define OSI_TX_FRAME_ERR            0x1002U
#define OSI_RECEIVE_CHECKSUM_ERR    0x1003U
#define OSI_PCS_AUTONEG_ERR         0x1004U
#define OSI_MACSEC_RX_CRC_ERR       0x1005U

1.Looking forward to your guidance on the meaning of OSI_TX_FRAME_ERR
2.Looking forward to your guidance on the meaning why Failed to report errormessage.

Hi,

If you are designing a custom base board, then it means some adaptation configurations are needed.
Otherwise, your board may not work fine.

For Orin AGX series, you could refer to below document

(please be aware that above link is for rel-36.3/jetpack6.0)

This document includes below configuration

  1. pinmux change & GPIO configuration
  2. EEPROM change as most custom boards do not have an EEPROM on it.
  3. Kernel porting
  4. PCIe configuration
  5. USB configuration
  6. MGBE configuration
  7. RGMII configuration

Thanks!

Thank you for your reply. Yes, we are indeed designing our own custom base board. The basic functionality and performance of this network card are normal, including its features and stability. However, this error occasionally appears in the logs. We would like to understand the specific cause of this issue.

Hi,

How you triggered this error is more important than interpret the meaning. Please elaborate that first.

Also, which jetpack version are you using?

Thank you for your reply. We are using JetPack 36.3, and we discovered the issue during a bidirectional TCP bandwidth stability test with a 10G peer network using iperf3. I have already restarted the system and will try again to find a reliably reproducible scenario. Additionally, I will trace the specific assignment statement of the err variable in nvethernet.

Hi,

Could you check if this patch would help here?

diff --git a/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c b/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c
index e072e34..ff70671 100644
--- a/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c
+++ b/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c
@@ -295,8 +295,9 @@
 		mutex_unlock(&pdata->hsi_lock);
 	}
 
-	if (osi_core->hsi.report_err == OSI_ENABLE ||
-	    osi_core->hsi.macsec_report_err == OSI_ENABLE)
+	if (osi_core->hsi.enabled == OSI_ENABLE &&
+	    (osi_core->hsi.report_err == OSI_ENABLE ||
+	     osi_core->hsi.macsec_report_err == OSI_ENABLE))
 		ether_common_isr_thread(0, (void *)pdata);
 
 	schedule_delayed_work(&pdata->ether_hsi_work,

Are you referring to the modification of this function?

static inline void ether_hsi_work_func(struct work_struct *work)  
{  
    struct delayed_work *dwork = to_delayed_work(work);  
    struct ether_priv_data *pdata = container_of(dwork,  
            struct ether_priv_data, ether_hsi_work);  
    struct osi_core_priv_data *osi_core = pdata->osi_core;  
    u64 rx_udp_err;  
    u64 rx_tcp_err;  
    u64 rx_ipv4_hderr;  
    u64 rx_ipv6_hderr;  
    u64 rx_crc_error;  
    u64 rx_checksum_error;  

    rx_crc_error =  
        osi_core->mmc.mmc_rx_crc_error /  
            osi_core->hsi.err_count_threshold;  
    if (osi_core->hsi.rx_crc_err_count < rx_crc_error) {  
        osi_core->hsi.rx_crc_err_count = rx_crc_error;  
        mutex_lock(&pdata->hsi_lock);  
        osi_core->hsi.err_code[RX_CRC_ERR_IDX] =  
            OSI_INBOUND_BUS_CRC_ERR;  
        osi_core->hsi.report_err = OSI_ENABLE;  
        mutex_unlock(&pdata->hsi_lock);  
    }  

    rx_udp_err = osi_core->mmc.mmc_rx_udp_err;  
    rx_tcp_err = osi_core->mmc.mmc_rx_tcp_err;  
    rx_ipv4_hderr = osi_core->mmc.mmc_rx_ipv4_hderr;  
    rx_ipv6_hderr = osi_core->mmc.mmc_rx_ipv6_hderr;  
    rx_checksum_error = (rx_udp_err + rx_tcp_err +  
        rx_ipv4_hderr + rx_ipv6_hderr) /  
            osi_core->hsi.err_count_threshold;  
    if (osi_core->hsi.rx_checksum_err_count < rx_checksum_error) {  
        osi_core->hsi.rx_checksum_err_count = rx_checksum_error;  
        mutex_lock(&pdata->hsi_lock);  
        osi_core->hsi.err_code[RX_CSUM_ERR_IDX] =  
                OSI_RECEIVE_CHECKSUM_ERR;  
        osi_core->hsi.report_err = OSI_ENABLE;  
        mutex_unlock(&pdata->hsi_lock);  
    }  

    if (osi_core->hsi.report_err == OSI_ENABLE ||  
        osi_core->hsi.macsec_report_err == OSI_ENABLE)  
        ether_common_isr_thread(0, (void *)pdata);  

    schedule_delayed_work(&pdata->ether_hsi_work,  
                          msecs_to_jiffies(osi_core->hsi.err_time_threshold));  
}  
#endif  

I will conduct further testing. Thanks!

Correct. I notice this patch is missing on rel-36 but present on rel-35 and that one is related to the error you reported.

Thank you very much for your suggestion! May I ask about the source of this patch and the related changelog? If we need to merge the patch into our own code, this information might be necessary.

You could check the same file on rel-35 and it will provide you the change log if you download the source code by using source_sync.sh. That one will have git logs.

I sincerely apologize for the trouble. I tried downloading the code with Git from

https://nv-tegra.nvidia.com/r/kernel/nvethernetrm.git

and

https://nv-tegra.nvidia.com/r/linux-5.10,

but I couldn’t find any related nvethernet code in either repository.

I also downloaded the Driver Package (BSP) Sources from

Jetson Linux | NVIDIA Developer.

This package contains the kernel source, and I found the modified version of the drivers/net/ethernet/nvidia/nvethernet/ether_linux.c file. However, I couldn’t locate the source_sync.sh script to retrieve the related Git information.

I’m a bit confused about how to obtain the nvethernet driver code with Git. Could you provide some guidance?

source sync is in your BSP where you use to flash the board. Not in the source code tarball.

I downloaded the JetPack BSP package for version 35.6 from the official website. After extracting the kernel_src.tbz2(which does not include a Git repository), the directory structure looks like this:

➜  kernel ls
kernel-5.10  nvethernetrm  nvgpu  nvidia

I need the network driver located in ./nvidia/drivers/net/ethernet/nvidia/nvethernet/ether_linux.c. Then I found the official Git repository reference here:

Working With Sources — NVIDIA Jetson Linux Developer Guide 1 documentation,

which states that the repository for Linux_for_Tegra/source/public/kernel_src.tbz2 is

https://nv-tegra.nvidia.com/r/linux-5.10.

However, after I cloned that repository, it only had the kernel-5.10 source with Git, and it did not include the nvidia folder that contains the nvethernet code.

Hi,

我覺得可能用中文說明會比較清楚一點 感覺上你沒辦法讀懂我前面要說的東西.

我們提的"BSP"指的是你用來燒錄的那一包. 跟你的source code不是同一件事情.
你一直在提你下載source code. 但我們燒錄那包根本就沒有source code. 你完全搞錯地方.

然後linux-5.10只會有kernel-5.10那包. linux-nvidia是另外一個repo.

感谢感谢!

我核心的诉求就是因为您提到了上面关于nveth的一个修改patch,我想找到这个patch的git log的commit说明。这就需要我去找到这个带git的代码。

我尝试了好几种方法,都没有找到nvethernet带git的repo,linux5.10里不包含这个驱动,带这个驱动的tar包里又没有git信息,所以陷入了两难。

Hi,

source_syncs.sh裡面本身就已經有列出來ethernet drvier的git repo路徑了

你根本不用自己選

感谢!可能是我的source_syncs.sh脚本问题
我的cat出来是这样:

你可能要先回答一下你到底是想要用哪一版Jetpack…

感谢,我已经找到了您说的patch

我们用的是36,我从36的仓库拿的source_sync.sh脚本是我发的截图,但是您的截图已经提供给了我35的仓库地址,我也在https://nv-tegra.nvidia.com/r/admin/repos上找到了您说的linux-nvidia仓库,并顺利的找到了您上面的patch修复。