CAN bus problem on Xavier

Hi there!
We are building an autonomous vehicle based on AGX Xavier.
We have two can interfaces connected, can0 connected to diesel engine bus and can1 to implements bus.
Every thing is working fine for 95% of the time but then we start see some errors like these:

Mar 11 07:14:18 ox kernel: [ 336.672366] mttcan c320000.mttcan can1: Format Error Detected
Mar 11 07:14:18 ox kernel: [ 336.672535] mttcan c320000.mttcan can1: IR 0x8400000 PSR 0x712

After some time the following error appears and it freezes the whole local network including wlan0:

Mar 11 07:18:18 ox kernel: [ 756.872585] mttcan c320000.mttcan can1: mttcan_poll_ir: some msgs lost on in Q0

It lasts up to 20 minutes and finally error messages and network problems dissapear.
The problem occurs every day ones or twice and as I said 95% of the time we do not have any issues.

Any suggestions, recommendations on that?
Thanks.

hello razvilgrad,

may I also know which JetPack release version you’re working with?
had you also test with can-utils for verification?
thanks

Hello, thanks for quick response.

R32 (release), REVISION: 4.4, GCID: 23942405, BOARD: t186ref, EABI: aarch64, DATE: Fri Oct 16 19:37:08 UTC 2020

What do you think I may check with can-utils?

hello razvilgrad,

are you able to reproduce below failure by testing with CAN loopback?
thanks

[ 336.672366] mttcan c320000.mttcan can1: Format Error Detected

Hello Jerry,
We discoconnected almost everything from can buses, we just have diesel engine connected on can0 and cansniffer connected on can1 and we publish just engine RPM message.

Every 10-15 minutes we have this problem:

[ +1.815953] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.827127] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.386884] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +1.186774] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.042045] mttcan_poll_ir: 12 callbacks suppressed
[ +0.000122] mttcan c310000.mttcan can0: Tx event lost
[ +0.108646] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.309283] mttcan c310000.mttcan can0: Tx event lost
[ +0.285857] mttcan c310000.mttcan can0: Tx event lost
[ +0.163722] mttcan c310000.mttcan can0: Tx event lost
[ +0.025571] mttcan c310000.mttcan can0: Tx event lost
[ +0.453109] mttcan c310000.mttcan can0: Tx event lost
[ +0.025656] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.444670] mttcan c310000.mttcan can0: Tx event lost
[ +0.018302] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.040964] mttcan c310000.mttcan can0: Tx event lost
[ +0.017601] mttcan c310000.mttcan can0: Tx event lost
[ +0.050254] mttcan c310000.mttcan can0: Tx event lost
[ +0.349240] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.131447] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.466497] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +2.563466] mttcan_poll_ir: 42 callbacks suppressed
[ +0.000723] mttcan c310000.mttcan can0: Tx event lost
[ +0.005107] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0
[ +0.034383] mttcan c310000.mttcan can0: Tx event lost
[ +0.155232] mttcan c310000.mttcan can0: Tx event lost
[ +0.046186] mttcan c310000.mttcan can0: Tx event lost
[ +0.146536] mttcan c310000.mttcan can0: Tx event lost
[ +0.054019] mttcan c310000.mttcan can0: Tx event lost
[ +0.046847] mttcan c310000.mttcan can0: Tx event lost
[ +0.220302] mttcan c310000.mttcan can0: Tx event lost
[ +0.028960] mttcan c310000.mttcan can0: Tx event lost
[ +0.075673] mttcan c310000.mttcan can0: Tx event lost

It lasts for 1-3 minutes, during that time engine looses RPM and after that problem disappears.
Also we have oscilloscope connected, which shows no errors, as well as can sniffer.
Any comments on that?

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi razvilgrad,
May I know which bitrate have you set for CAN0 and CAN1. Also, have you made any changes in SW to configure buffer/queue sizes?