MTTCAN on Orin NX issues

Custom board

latest, 5.1.1

VBOX;
When connecting microchip can analyzer I can see the messages. Moreover, the pico analyzer shows the message (as in the original post) but with “valid” because the "ack’ bit is on. i.e microship can analyzer succeed to parse the CAN messages and ack the VBOX, where the OrinNX does not able to ack and it claims to have bad format.

I’ll try later. but mainly the messages continue on “Format Error.” even when the cable disconnected. so its something with the MTTCAN driver/controller that stays on its state without restarting.

it does not able to recover. the controller seems to be disconnected and does not produce any transmission on the CAN

My feel that it relates to our device tree configuration. can you spot any errors there

We would need the full serial console log or dmesg when you re-load can driver.

Does this issue could only be reproduced with external device?
I’m not clear about how do you hit the issue.
Could you provide the detailed reproduce steps on the devkit and we could verify locally?

  1. i’ll be able to provide the dmesg log/serial console log on Sunday,
  2. yeap, Our setup include 2 OrinNx, they can communicate with each other, but as soon as external/non jetson device is joining the network, the communication break. The OrinNx complains about “Format Error”, as can be seen in the scope, the messages are fine. This what makes me the that the issues with the internal clock definition of the mttcan.
  3. the Orin devkit does not have can transceivers, right? if it has I can try reproduce it on it

edit1 - clocks information :


/sys/kernel/debug/bpmp/debug/clk# cat pll_aon/rate
400000000

/sys/kernel/debug/bpmp/debug/clk# cat pll_aon/children
can2 can1
/sys/kernel/debug/bpmp/debug/clk# cat can1/rate
200000000
/sys/kernel/debug/bpmp/debug/clk# cat can1_core/rate
50000000
/sys/kernel/debug/bpmp/debug/clk# cat can1_host/rate
200000000

Yes, please help to provide the serial console log to check if there’s any errors.

How do you make another Jetson join the network? Is there any step to do this and maybe we could try to reproduce?

Yes, there’s no internal CAN transceiver in the devkit, you would need to get CAN transceivers to verify CAN transmission.

Regarding joining the network we just plug the devices high,low lines into the network with the CAN setting (baud rate and etc ). tested with 250kpbs

Could you provide the detailed steps for this?
Which I/O pin you are controlling for high/low?

HIGH & LOW comes from our CAN transceiver as the SOM (Orin NX) does not have one.

any other ideas?

after rmoving mttcan from kernel and reloading it ( note that we have also mcp2515 on board)

Is this your current issue?
Could you help to clarify how to reproduce this?

The dmesg seems as expected when you re-load the mttcan.

Hey!

To sum up the issues: We have three issues

1st: after short time, might be related to the number of messages per second, the CAN will report “Format error “ although the scope shows that the message is valid. This happens with external device communicate on the CAN network with OrinNX via MTTCAN.
This happens to us even with an external device that running “can gem” In high freq. I.e

“cangen can1 -v -g 1”

2nd:
After any Format error message on the CAN, the MTTCAN controller is stuck on this state. It pollutes the syslog/ dmesg with the same error. It stays that way even the external device is disconnected from the network.

3rd:

The MTTCAN controller failed to restart its state. Even when removing all CAN modules from the kernel and reloading them back. Even softest reboot does not bring back the MTTCAN controller. The scope shows nothing on the output of the controller on this state.

We tried getting into the MTTCAN registers but memdev failed, dmesg shows “unprivileged access “

I think we should start from the first issue.
Please share the full serial console log including the “Format error” you said.

Could you describe more detail about how to reproduce this?
Maybe the block diagram of your connections and the exact commands you used would be better for me to understand.

Can you share the full serial console log including the “Format error”?
Any updte to move this issue forward or you have fixed it?

Thanks

Hey
We did not solve it yet, we think the main issue is the fact that the controler cannot recover. Only disconnecting the power solve it at the moment.
here is a lot of logs from different runs on the ORIN.
our hypothesis: the kernel does not restart the controller correctly.

MobaXterm terminal output ORINNX - can error.zip (578.6 KB)

EDIT 1

try to shortcut for a milisecond the high and low. or high/low to ground. in our setup the controller is not able to recover

EDIT 2

more tricky way is

  1. set device A to 250Kbps, device B 250Kbps, test that everything is working
  2. change device B to 500Kbps, you will get alot of format error
  3. change device B back to 250Kbps, now device B can send message
  4. but device A cannot send or device B cannot receive.

EDIT 3

(suggestion use tmux )
on both devices:

  1. cangen can1 -g 2 -v
  2. watch -d -n 0.1 ip -d -s link show
  3. candump can1

on device B

  1. sudo ip link set can1 down
  2. sudo ip link set can1 type can bitrate 500000 berr-reporting on restart-ms 100
  3. sudo ip link set can1 up
  4. this will break can gen and can dump so re run (1,2,3) from first step
  5. this will print format error and bit0/1 error
  6. reset bitrate back to 250000
    7… sudo ip link set can1 down
  7. sudo ip link set can1 type can bitrate 250000berr-reporting on restart-ms 100
    9 sudo ip link set can1 up
  8. this will break can gen and can dump so re run (1,2,3) from first step

now verify if the two devices can send and receive

Do you mean the issue occur only when you configure it to 500Kbps?
Please use the same bitrate for both devices.

Are you testing with 2 Orin NX?
Could you share the block diagram of your connections?

Could you test it with re-load the module and share the logs?

$ sudo rmmod mttcan
$ sudo modprobe mttcan
$ sudo ip link set can1 down
$ sudo ip link set can1 type can bitrate 500000 berr-reporting on restart-ms 100
$ sudo ip link set can1 up

Do you mean the issue occur only when you configure it to 500Kbps?
Please use the same bitrate for both devices.

no, I meant that if the two devices are not sync with the same bitrate than MTTCAN controller cannot recover after restoring the same bitrate for the two devices

Are you testing with 2 Orin NX?
Could you share the block diagram of your connections?
Yes ,

Could you test it with re-load the module and share the logs?
we have two cases:

  1. if the error is a bitrate mismatch then reloading the mttcan module from kernel restore the CAN functionality
  2. if the error comes from the physical layer which result with Format error then even after disconnect compeltliy the entire network and restore clean version of it the above does not restore the CAN.
  3. the output in the logs are the same for the two cases


We found Orin CAN_H CAN_L short-circuit bus off and cannot recover which seems very similar to our problem but without solution there. Do you know about any solution for this one?

How do you know the error from physical layer? How do you trigger this kind of error?

Is your issue about entering into error passive state?

How do you know the error from physical layer? How do you trigger this kind of error?

Can be several options, noise cable, new device plug in a working CAN network, add/ing removing of termination while working

Is your issue about entering into error passive state ?

Our issue is that MTT Controller fails to recover from a failure. A CAN controller has few state until it cuts it self from the network. But we set it to restart it self. Moreover, when we get the MTT Controller failures the OS reports the MTT CAN is up but nothing happens there. This can be seen even with a scope, the CAN of the device is dead

EDIT 1

we saw the Jetpack 5.1.2 released few days ago, we tested it, the problem exists also in 5.1.2

EDIT 2

even on modprobe -r mttcan/ modprobe mttcan, sometimes only the receiving side is working. and we need to restart it again to get the sender to work as well.

Could you verify if the following patch for CAN driver help with your current issue?

index 2111ce245a..11fa5adffd 100644
--- a/drivers/net/can/mttcan/native/m_ttcan_linux.c
+++ b/drivers/net/can/mttcan/native/m_ttcan_linux.c
@@ -1333,7 +1333,14 @@ static int mttcan_close(struct net_device *dev)
 	napi_disable(&priv->napi);
 	mttcan_stop(priv);
 	free_irq(dev->irq, dev);
+
+	/* When we do power_down, it resets the mttcan HW by setting
+	 * INIT bit. This clears the internal state of mttcan HW.
+	 * We also then need to clear the internal states of driver.
+	 */
+	priv->ttcan->tx_object = 0;
 	priv->hwts_rx_en = false;
+
 	close_candev(dev);
 	mttcan_power_down(dev);
 	mttcan_pm_runtime_put_sync(priv);

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.