CAN bus not recovering from ERROR-PASSIVE

gshim · May 24, 2023, 5:49pm

I have an application where I need to change the bitrates of the CAN bus depending on the system’s mode of operation. I’m running into an issue where if changing the bitrate of the mttcan module and trying to send a CAN message on the bus prematurely (before the bitrate of the other node on the bus has been changed), then the berr tx counter fills to 128 since the transmitted messages are not being ACKd on the bus and the module automatically retries sending, further increasing the error counter.

When this happens, the module enters ERROR-PASSIVE.

Then when trying to send a series of dummy messages on the bus, for example cansend can0 123#1234 I receive a write: No buffer space available error after sending qlen number of messages (in my case 10). I would imagine the CAN module should get out of this state when the network’s bitrates all match.

I’m able to receive messages, but transmitting doesn’t work. I would expect that bringing down the device using
sudo ip link set can0 down
and bringing it back up
sudo ip link set can0 up type can bitrate 100000

would clear the state? Unfortunately it doesn’t. However it appears that the network buffers do get cleared, because I can then run cansend can0 123#1234 (without any traffic on the bus) 10 more times before seeing a write: No buffer space available message.

Other than bringing down the driver using modprobe -d mttcan and bringing it back up (which works) how would I recover from this?

KevinFFF · May 25, 2023, 2:38am

Hi gshim,

Are you using the devkit or custom board for AGX Orin?
What’s your Jetpack version in use?

How do you verify with above test? Is that loopback test or you connecting a CAN device through can transceiver?

gshim · May 25, 2023, 1:37pm

I’m using the devkit. The output of cat /etc/nv_tegra_release comes back with # R34 (release), REVISION: 0.1, GCID: 29955323, BOARD: t186ref, EABI: aarch64, DATE: Tue Mar 15 08:13:50 UTC 2022

The test above is done by connecting an external device through a CAN transceiver. I’m not sure how I would mimic this test using loopback mode, as I would need to change the bitrate and test for the case where one device was set to the new bitrate, while the other was still at the original bitrate, or vice versa (we need to be able to recover from that failure mode once all devices are on the same bitrate, which is where the original issue comes in).

KevinFFF · May 26, 2023, 6:04am

It seems the old version for AGX Orin.

Could you help to use the latest R35.3.1 to verify?

Internal loopback test would not need to connect any external cable, just follow the instruction from the document and it should work.

gshim · May 26, 2023, 8:12pm

I’ve upgraded to 35.3.1 and this is still an issue.

Again, not sure how a loopback would catch this failure mode, as this is testing bus with two nodes, node A and node B, where node B is unresponsive and not ACKing node A’s messages, causing node A’s transmit error counter to go up. When node B goes back on the bus, node A’s transmit errors should start going down.

gshim · May 30, 2023, 1:55pm

I can confirm that running the same test on different host hardware but the same target hardware works. I’m using a Kvaser Leaf to send a message to an unresponsive node. The host node enters the ERROR-PASSIVE state, but recovers to active as soon as the target node becomes responsive. This does not happen on the AGX Orin.

KevinFFF · June 6, 2023, 8:29am

For AGX Orin, it seems only could be recovered through reloading kernel module.

rmmod mttcan
modprobe mttcan

system · June 28, 2023, 2:46am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.