Orin: CAN Bus Not recovering from ERROR-Passive

The solution presented in the previous discussion about this topic does not properly address the issue, which is for the driver to automatically recover from a bus-off state.

The patch only clears the tx_object when the interface is brought down.

I have tested a patch that adds the clearing of the tx_object which holds the bitmap status of the messages in the tx_mailboxes.

diff --git a/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c b/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
index 18132a7..d506f99 100644
--- a/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
+++ b/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
@@ -1096,6 +1096,7 @@ static void mttcan_bus_off_restart(struct work_struct *work)
 restart:
 	netdev_dbg(dev, "restarted\n");
 	priv->can.can_stats.restarts++;
+	priv->ttcan->tx_object = 0;
 
 	mttcan_start(dev);
 	netif_carrier_on(dev);

The problem with this patch is that any messages that were not yet transmitted will be lost when the driver restarts.
I’m working on a fix that improves this, but it’s not fully flushed out as some messages are still lost.

diff --git a/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c b/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
index 18132a7..43d8113 100644
--- a/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
+++ b/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
@@ -1079,6 +1079,8 @@ static void mttcan_bus_off_restart(struct work_struct *work)
 	struct net_device_stats *stats = &dev->stats;
 	struct sk_buff *skb;
 	struct can_frame *cf;
+	u32 msg_no;
+	u32 unsent_tx;
 
 	/* send restart message upstream */
 	skb = alloc_can_err_skb(dev, &cf);
@@ -1099,6 +1101,13 @@ restart:
 
 	mttcan_start(dev);
 	netif_carrier_on(dev);
+	// need to attempt to restransmit any messages stuck in tx_object
+	unsent_tx = priv->ttcan->tx_object;
+	while (unsent_tx) {
+		msg_no = ffs(unsent_tx) - 1;
+		ttcan_tx_trigger_msg_transmit(priv->ttcan, msg_no);
+		unsent_tx &= ~(1U << msg_no);
+	}	
 }
 
 static void mttcan_start(struct net_device *dev)

open to any suggestions on how to ensure all messages that were not sent during the bus-off condition can be sent. I suspect that these failed to on the netif layer, but not 100% sure.

this is what i was using to check that messages are sent and how many are lost:

counter=0; while true; do payload=$(printf "1%014x" $counter); cansend can0 1F334454##$payload; ((counter++)); sleep 0.01; done

and this is how i bring up the can if:

sudo ip link set can0 type can bitrate 1000000 dbitrate 4000000 fd on sample-point .80 dsample-point .80 restart-ms 1 berr-reporting on
1 Like