The solution presented in the previous discussion about this topic does not properly address the issue, which is for the driver to automatically recover from a bus-off state.
The patch only clears the tx_object when the interface is brought down.
I have tested a patch that adds the clearing of the tx_object which holds the bitmap status of the messages in the tx_mailboxes.
diff --git a/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c b/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
index 18132a7..d506f99 100644
--- a/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
+++ b/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
@@ -1096,6 +1096,7 @@ static void mttcan_bus_off_restart(struct work_struct *work)
restart:
netdev_dbg(dev, "restarted\n");
priv->can.can_stats.restarts++;
+ priv->ttcan->tx_object = 0;
mttcan_start(dev);
netif_carrier_on(dev);
The problem with this patch is that any messages that were not yet transmitted will be lost when the driver restarts.
I’m working on a fix that improves this, but it’s not fully flushed out as some messages are still lost.
diff --git a/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c b/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
index 18132a7..43d8113 100644
--- a/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
+++ b/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c
@@ -1079,6 +1079,8 @@ static void mttcan_bus_off_restart(struct work_struct *work)
struct net_device_stats *stats = &dev->stats;
struct sk_buff *skb;
struct can_frame *cf;
+ u32 msg_no;
+ u32 unsent_tx;
/* send restart message upstream */
skb = alloc_can_err_skb(dev, &cf);
@@ -1099,6 +1101,13 @@ restart:
mttcan_start(dev);
netif_carrier_on(dev);
+ // need to attempt to restransmit any messages stuck in tx_object
+ unsent_tx = priv->ttcan->tx_object;
+ while (unsent_tx) {
+ msg_no = ffs(unsent_tx) - 1;
+ ttcan_tx_trigger_msg_transmit(priv->ttcan, msg_no);
+ unsent_tx &= ~(1U << msg_no);
+ }
}
static void mttcan_start(struct net_device *dev)
open to any suggestions on how to ensure all messages that were not sent during the bus-off condition can be sent. I suspect that these failed to on the netif layer, but not 100% sure.
this is what i was using to check that messages are sent and how many are lost:
counter=0; while true; do payload=$(printf "1%014x" $counter); cansend can0 1F334454##$payload; ((counter++)); sleep 0.01; done
and this is how i bring up the can if:
sudo ip link set can0 type can bitrate 1000000 dbitrate 4000000 fd on sample-point .80 dsample-point .80 restart-ms 1 berr-reporting on