AGX Orin: CAN bus does not automatic recover from BUS_OFF state

lpz_thread · December 29, 2023, 3:30am

我们在LInux R35.4.1版本上测试mttcan ，当我们短接can_h和can_l时，CAN的状态机会进入BUS-OFF，然后重新回到ACTIVE，但是发送不出去数据，报write: No buffer space available，只能接收数据，我们在mttcan_bus_off_restart函数中添加了一句priv->ttcan->tx_object = 0之后能够发送数据了，请问这样修改对吗，有没有其他影响

KevinFFF · December 29, 2023, 6:10am

請問你是用devkit or custom board?

你是否參考另篇在mttcan_close()的修正？

lpz_thread · December 29, 2023, 6:39am

custom board，是的，参考了mttcan_close，另外还有一处进行了修改，删除了mttcan_start_xmit的netif_stop_queue函数

static netdev_tx_t mttcan_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
int msg_no = -1;
struct mttcan_priv *priv = netdev_priv(dev);
struct canfd_frame *frame = (struct canfd_frame *)skb->data;
if (can_dropped_invalid_skb(dev, skb))
return NETDEV_TX_OK;

if (can_is_canfd_skb(skb))
	frame->flags |= CAN_FD_FLAG;

spin_lock_bh(&priv->tx_lock);

/* Write Tx message to controller */
msg_no = ttcan_tx_msg_buffer_write(priv->ttcan,
		(struct ttcanfd_frame *)frame);
if (msg_no < 0)
	msg_no = ttcan_tx_fifo_queue_msg(priv->ttcan,
			(struct ttcanfd_frame *)frame);
if (msg_no < 0) {
	//netif_stop_queue(dev);
	spin_unlock_bh(&priv->tx_lock);
	return NETDEV_TX_BUSY;
}

KevinFFF · January 3, 2024, 8:03am

Where do you add this line? Could you share the change?

Could you send and receive the expected data at this moment w/o any error messages?
Please also share the full dmesg for further check.

lpz_thread · January 3, 2024, 8:13am

sorry，我们在测试短接can_h和can_l的时候，尽管已经修改了代码，还是会小概率的出现数据无法发送，经过添加打印信息定位到ttcan_tx_fifo_queue_msg函数，一直会进入这个条件判断（ if (ttcan->tx_object & (1 << put_idx))），即使从bus-off恢复时清ttcan->tx_object，还是会偶现进入该条件，导致无法发送数据
int ttcan_tx_fifo_queue_msg(struct ttcan_controller *ttcan,
struct ttcanfd_frame *ttcanfd)
{
u32 txfqs_reg;
u32 put_idx;

txfqs_reg = ttcan_read32(ttcan, ADR_MTTCAN_TXFQS);

/* Test for Tx FIFO/Queue full */
if (txfqs_reg & MTT_TXFQS_TFQF_MASK)
{
	return -ENOMEM;
}
	

/* Test if Tx index is previously reserved in SW */
put_idx = (txfqs_reg & MTT_TXFQS_TFQPI_MASK) >> MTT_TXFQS_TFQPI_SHIFT;
if (ttcan->tx_object & (1 << put_idx))
{
	//printk("%s %d\n",__func__,__LINE__);
	return -ENOMEM;
}


/* Write to CAN controller message RAM */
ttcan_tx_ded_msg_write(ttcan, ttcanfd, put_idx);

return put_idx;

}

lpz_thread · January 3, 2024, 8:27am

下面是bus-off到正常恢复发数据的dmesg：
[ 524.737699] mttcan c310000.mttcan can0: entered error passive state
[ 525.084584] mttcan c310000.mttcan can0: entered error passive state
[ 527.466724] mttcan c310000.mttcan can0: entered error warning state
[ 527.467986] mttcan c310000.mttcan can0: entered error passive state
[ 527.695419] mttcan c310000.mttcan can0: entered bus off state
[ 528.701229] Message RAM Configuration
| base addr |0x0c312000|
| sidfc_flssa |0x00000000|
| xidfc_flesa |0x00000040|
| rxf0c_f0sa |0x000000c0|
| rxf1c_f1sa |0x000009c0|
| rxbc_rbsa |0x000009c0|
| txefc_efsa |0x000009c0|
| txbc_tbsa |0x00000a40|
| tmc_tmsa |0x00000ec0|
| mram size |0x00001000|
[ 528.702612] Release 3.2.3 from 09.06.2018
[ 528.702624] mttcan_controller_config: ctrlmode 20
[ 528.702653] mttcan c310000.mttcan can0: Bitrate set
[ 528.702666] mttcan c310000.mttcan can0: wait for bus off seq
[ 528.714775] IPv6: ADDRCONF(NETDEV_CHANGE): can0: link becomes ready
[ 528.724354] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 0 is occupied!
[ 528.727103] mttcan c310000.mttcan can0: entered error warning state
[ 528.728314] mttcan c310000.mttcan can0: entered error passive state
[ 528.748892] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 1 is occupied!
[ 528.772795] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 2 is occupied!
[ 528.796848] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 3 is occupied!
[ 528.821801] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 4 is occupied!
[ 528.845501] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 5 is occupied!
[ 528.869049] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 6 is occupied!
[ 528.894328] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 7 is occupied!
[ 528.918615] mttcan c310000.mttcan can0: can_put_echo_skb: BUG! echo_skb 8 is occupied

KevinFFF · January 3, 2024, 8:50am

Yes, I’ve just verified your changes and would also get this error so that I asked if you would get any errors.

Could you check if bringing CAN interface down/up helps for your case?

$ sudo ip link set can0 down
$ sudo ip link set can0 up type can bitrate 100000 berr-reporting on restart-ms 1000

lpz_thread · January 3, 2024, 9:12am

是这样的的，虽然dmesg中报了BUG! echo_skb 1 is occupied!这个错误，但是还是能够自恢复，能够正常发送数据，应用不再报write: No buffer space available，但是偶现不能恢复的情况，下面截图就是我测试短接了出现33次之后，发送数据不能自恢复，这个时候执行sudo ip link set can0 down、sudo ip link set can0 up type can bitrate 100000 berr-reporting on restart-ms 1000后又能发送数据，结合前面的追踪，在bus-off到ttcan_tx_fifo_queue_msg，ttcan->tx_object还是被置为某个消息编号值未被释放，导致一直获取不到有效的msgno

KevinFFF · January 11, 2024, 2:30am

Please just apply the following patch, it should help for your use case.

--- a/drivers/net/can/mttcan/native/m_ttcan_linux.c
+++ b/drivers/net/can/mttcan/native/m_ttcan_linux.c
@@ -491,6 +491,7 @@ static int mttcan_state_change(struct net_device *dev,
                priv->can.can_stats.bus_off++;
 
                netif_carrier_off(dev);
+               priv->ttcan->tx_object = 0;
 
                if (priv->can.restart_ms)
                        schedule_delayed_work(&priv->drv_restart_work,

lpz_thread · January 12, 2024, 3:02am

只修改这里不行的，还需要删除了mttcan_start_xmit的netif_stop_queue函数

KevinFFF · January 12, 2024, 6:36am

請問是哪裡會不行呢?
我這邊驗證看起來加了這行後的行為就會是預期的，當把CAN-H/CAN-L連接回正常狀態，即可自行恢復並繼續收發packets，且不會有以上echo_ekb的錯誤訊息

lpz_thread · January 12, 2024, 6:49am

我们的测试方法是这样的：
1、每隔10ms循环执行 cansend can0 123#112233
2、短接CAN_H和CAN_L，然后再重新分开接上CAN_H和CAN_L
如果只修改mttcan_state_change的tx_object ，重新接上之后cansend仍然一直报write: No buffer space available，通过ip -details -statistics link show can0 可以看到can状态从bus-off恢复，但是只能收不能发

KevinFFF · January 15, 2024, 7:01am

請問你是如何setup can0的?

lpz_thread · January 15, 2024, 7:03am

sudo ip link set can0 up type can bitrate 500000 sample-point 0.8 dbitrate 5000000 dsample-point 0.8 fd on restart-ms 1000

KevinFFF · January 17, 2024, 9:09am

看起來問題在於restart-ms 的設定，你必須在restart-ms的時間內恢復連接才能trigger mttcan_bus_off_restart來清空tx_object，麻煩試著設定restart-ms 5000，並在5s內short CAN-H/CAN-L且恢復連接看是否可正常發送CAN packets

lpz_thread · January 17, 2024, 9:23am

不是这样的，mttcan的bus-off状态不是CAN_H/CAN_L正常连接后恢复，而是只要离开短接状态，就会从bus-off恢复到active状态，所以我们测试是只是短时间触碰了CAN_H/CAN_L，这个时间很短，mttcan很快就从bus-off恢复；这里的问题是mttcan 处于bus-off时，我们应用仍然在发送数据，这个时候mttcan_start_xmit的netif_stop_queue被调用后，net层无法再将数据送到mttcan的驱动了，缺少恢复net队列的函数

KevinFFF · January 17, 2024, 9:30am

我指的是short完後要在restart-ms內恢復連接，不只是讓CAN-H/CAN-L unshorted, 你也要讓CAN-H接回CAN-H, CAN-L接回CAN-L
Bus Off是由於CAN-H/CAN-L的short造成
CAN driver會在Bus Off後的restart-ms觸發mttcan_bus_off_restart()來清空tx_object

lpz_thread · January 17, 2024, 9:42am

Ok，我懂你的意思了，你的意思只要5s内恢复连接，netif_stop_queue就不会被调用，恢复连接后发送数据mttcan_start_xmit就能正常发数据，就不会走到netif_stop_queue这个分支？

lpz_thread · January 17, 2024, 10:04am

我改成5s、10s、20s都测了，还是发不出来

kayccc · January 31, 2024, 4:37am

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Is this still an issue to support? Any result can be shared?

Topic		Replies	Views
AGX Orin: CAN bus does not recover from BUS_OFF state Jetson AGX Orin can-bus , nvbugs	27	2399	January 17, 2024
Orin: CAN Bus Not recovering from ERROR-Passive Jetson AGX Orin can-bus	20	544	July 17, 2024
Orin CAN_H CAN_L short-circuit bus off and cannot recover Jetson AGX Orin can-bus	23	1849	May 17, 2023
CAN error passive/Bus-off Jetson AGX Orin can-bus	7	1023	July 26, 2023
MTTCAN on Orin NX issues Jetson Orin NX can-bus	21	2163	September 12, 2023
Jetson AGX orin can0 Jetson AGX Orin kernel , can-bus	9	113	August 3, 2024
CANBUS not working Jetson AGX Xavier can-bus	10	87	September 2, 2024
CANH and CANL were short connected, ORIN’s CAN reset failed Jetson AGX Orin can-bus	8	892	May 2, 2023
CAN communication Issue Jetson Orin NX can-bus	32	2523	September 11, 2023
Jestion orin canbus send err Jetson AGX Orin can-bus	7	52	February 12, 2025

AGX Orin: CAN bus does not automatic recover from BUS_OFF state

Related topics