CAN bus has error after connecting Jetson

Hello,

We have an issue where we try to connect the Jetson TX2 to an existing CAN network, which already has dozens of working devices running at 250 kbps.

As soon as we bring the CAN bus online by typing:

sudo ip link set can0 type can bitrate 250000
sudo ip link set up can0

We start to see periodic errors on the CAN bus, which interferes with comunication from some devices. The errors happen about once a second and appear to be related to the Jetson trying to send ackowledgements, but it occasionally sends one that lasts too long and thus corrupts data on the bus.

We have tried two different CAN transceivers, a carrier board, and the dev board, so we think the issue is related to our initialization of the CAN controller in software.

Has anyone else experienced this issue? Any help would be appreciated.

Do you have a logic analyzer? Hook it up to the CTX and CRX signals on the Jetson side, and the CANH and CANL signals on the bus side. This should let you prove that it’s the Jetson causing the problems, and also see what kind of problem it’s causing.

If you don’t yet have a logic analyzer, I like the Saleae Logic ones for great user interface and good price/performance, but there are plenty of cheaper and more expensive other brands to choose from, too.

Also, how do you terminate the bus with the Jetson added? Does it perhaps add too much load for some of the other devices?

Hi,

Also Seems not all the required parameters are being passed before setting can interface up.

can you please try with the required parameters as below:
ip link set can0 type can bitrate 500000 dbitrate 2000000 berr-reporting on fd on
sudo ip link set up can0

Thanks & Regards,
Sandipan

Hello Sandipan,

We have tried adding the parameters you recommended, and we still have the same issue. But we have more information that may help resolve the problem.

We set up an oscilloscope on the CAN high and low lines and set it to trigger on each individual CAN frame. I will attach to this post two images: one of traffic when the Jetson is not connected (normal, no errors), and one when the Jetson is connected (periodic errors).

Without the Jetson connected, the CAN frames look fine, with one frame appearing per second containing the heartbeat message from our CANOpen devices.

With the Jetson connected, some of the traffic looks normal. But every few seconds, we see this error frame in the picture: the message looks identical, but the final bit at the end of the frame lasts much longer. I believe this is the acknowledge bit, and while it looks like the Jetson is trying to send an ACK, it doesn’t “latch” properly.

Thus, having not received an acknowledgement, our CANOpen devices repeat their message over and over again, completely tying up the bus and preventing other traffic from getting through.

We tried connecting the same exact CAN network to another Linux device (the Texas Instruments BeagleBone), and we do not see these error frame with long final bit. So we know it is the Jetson that is causing these to appear.

Hopefully this gives some insight into the problem. We greatly appreciate your support.

Josh

EDIT - Also, we see the following log messages every time the CAN errors occur:

[ 6877.813760] mttcan c320000.mttcan can1: IR 0x8400000 PSR 0x4752
[ 6877.819741] mttcan c320000.mttcan can1: entered error passive state
[ 6877.826008] mttcan c320000.mttcan can1: Format Error Detected
[ 6877.831761] mttcan c320000.mttcan can1: IR 0x8c00000 PSR 0x476a
[ 6878.804553] mttcan c320000.mttcan can1: Format Error Detected
[ 6878.810316] mttcan c320000.mttcan can1: IR 0x8400000 PSR 0x4752
[ 6878.816296] mttcan c320000.mttcan can1: entered error passive state
[ 6878.822562] mttcan c320000.mttcan can1: Format Error Detected

I’m just throwing something out there, I had similar a issue with a RaspberryPi and CAN earlier in the week, and it turned out I didn’t have the clock speed for CAN transceiver used by the CAN driver set correctly… What CAN Transceiver are you using? Can you configure the clock speed for the CAN Transceiver in the Jetson? (I know you can for the RapsberryPi, but I don’t think I’ve ever had to do this on the Jetson…)

Same problem here.

Jetson with PCAN, no problem, we see only the frame that we asked to send, no problem.

If we connect it in a network, when we put it up, all the network is “crazy” and some advanced stuff (curtis controller) send back that there is a big error on the network.

We put the can down, pouf network is ok.

EDIT - Nevermind, we still see some errors at the higher baud rate. Just not nearly as many. Still need to figure out what’s causing this.

After further investigation, we have discovered that we do not get CAN bus errors if we change all of the devices on our network to run at 500 kbps.

We are not sure if this has actually solved the problem, or simply made it less noticeable given the lower bus load. In any case, we would like to be able to run the CAN network at a lower baud rate. It is a pain to have to change all of our devices, and some of them cannot be changed without physical modification.

I hope this provides some insight into the cause of the problem.

We appear to have resolved the issue.

To anyone else experiencing problems: try adding these additional “sjw” and “dsjw” parameters to your CAN initialization command:

ip link set can0 type can bitrate 500000 <b>sjw 4</b> dbitrate 2000000 <b>dsjw 4</b> berr-reporting on fd on

What this does is increase the tolerance on bit timing for other devices on the network. We found that some of the devices on our CAN network had oscillator frequencies slightly off, meaning the effective bit would be e.g. 501 kbps instead of 500 kbps.

Fortunately, the CAN bus standard is designed to accomodate this problem. The SJW parameter determines how much the device can correct for timing errors. Here’s a document that explains it well:

https://www.mikrocontroller.net/attachment/114193/BOSCH_The_config_of_CAN_Bit_Timing_L-1.pdf

You can see all of the timing parameters by running this command:

ip -details -statistics link show can0

There are other numbers that can be adjusted to compensate for various network problems. We will continue running tests and tweaking the numbers to get the error rate down as low as possible. For now, setting SJW to 4 has made a huge difference.

Hope this helps anyone experiencing the same issue.

2 Likes

Dear all,

I have same problem. I sued TJA1040 at CAN transceiver and this is detail of can

nvidia@tegra-ubuntu:~$ ip -details -statistics link show can1
5: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0 
    can <BERR-REPORTING,FD> state ERROR-PASSIVE (berr-counter tx 0 rx 127) restart-ms 0 
	  bitrate 500000 sample-point 0.875 
	  tq 25 prop-seg 34 phase-seg1 35 phase-seg2 10 sjw 4
	  mttcan: tseg1 2..255 tseg2 0..127 sjw 1..127 brp 1..511 brp-inc 1
	  dbitrate 2000000 dsample-point 0.750 
	  dtq 25 dprop-seg 7 dphase-seg1 7 dphase-seg2 5 dsjw 4
	  mttcan: dtseg1 1..31 dtseg2 0..15 dsjw 1..15 dbrp 1..15 dbrp-inc 1
	  clock 40000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          31366      0          1          1          0         
    RX: bytes  packets  errors  dropped overrun mcast   
    250944     31368    31366   0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0

Almost CAN messages received were error even though when I checked oscilloscope CAN signal look good.

Thanks and Best Regards,
Vu Nguyen

Vu, since you are getting 100% CAN errors it sounds like you may have your baud rate configured incorrectly. Can you post a screenshot of the CAN signal on the scope?

And you are using a 120 ohm terminator, right?

Dear joshshields0,

Thank for your reply, actually I have only two devices on CAN bus and all of them were set to 500kbps.
This is probed signal on CAN1_RX pin (output from CAN receiver (TJA1040) to TX2)

Solution accepted ?
Ok the answer permit us to work (I hope) but it just said: if there is problem, doesn’t take into account.
It doesn’t resolve it.

It clearly seems that jetson has a “special” CAN behaviour (cause it is not “real time” like automaton ?). Only in this topic I think there is a dozen of various industrial equipment without problem in other cases.

Why jetson has problem ?

milachon,

The solution I posted was to increase bit timing tolerance. This compensates for the error in bit timing of other devices on the network.

When we measured the Jetson on the scope, we found its timing to be within 0.02% of spec. It is very good.

The problem was that other devices on our network (not made by NVIDIA) were much worse with their timing (up to 1% off). We found that the Jetson was rejecting their CAN frames, as the Jetson was set to expect perfect timing from every device.

If changing the SJW setting as I mentioned above makes your CAN errors disappear, then you have resolved the problem. This is not a “hack;” it is an actual feature of the CAN spec that is meant to be used to compensate for inaccurate timing across devices.

Vu,

Try running this command after sending some CAN traffic to find out more about the errors:

sudo dmesg | grep mttcan

ok thank you for this explanation.
So it is more linux from box that put zero tolerance on CAN. Usually used with usb2can who implements their own electronics.

Hi joshshields0,

If CAN1 only receive, this is grep log

[  134.583120] mttcan c320000.mttcan can1: Format Error Detected
[  134.597341] mttcan c320000.mttcan can1: Stuff Error Detected
[  134.603100] mttcan c320000.mttcan can1: Format Error Detected
[  134.617340] mttcan c320000.mttcan can1: Stuff Error Detected
[  134.623080] mttcan c320000.mttcan can1: Format Error Detected
[  134.637337] mttcan c320000.mttcan can1: Stuff Error Detected
[  134.643088] mttcan c320000.mttcan can1: Format Error Detected

If I try to send some messages from CAN1, grep log will be as below

[  134.703087] mttcan c320000.mttcan can1: Format Error Detected
[  134.717336] mttcan c320000.mttcan can1: Stuff Error Detected
[  134.723069] mttcan c320000.mttcan can1: Format Error Detected
[  134.737382] mttcan c320000.mttcan can1: Format Error Detected
[  134.743305] mttcan c320000.mttcan can1: Format Error Detected
[  134.751805] mttcan c320000.mttcan can1: Bit0 Error Detected
[  134.757528] mttcan c320000.mttcan can1: entered bus off state
[  134.763374] mttcan c320000.mttcan can1: Bit0 Error Detected

Thanks

The only time I’ve ever seen that behavior is when the baud rate was set incorrectly. But from the picture of the scope you posted, it looks like your baud rate is accurate. So I’m not sure what is wrong.

Do you have a second CAN transceiver? You could connect CAN0 to CAN1 on the Jetson and see if you can receive your own traffic. That might help narrow down the problem.

I’m sorry I don’t have any more ideas other than obvious things that I’m sure you have already tried. You might want to post a separate thread for your problem; since this one is marked resolved, I’m not sure NVIDIA will see it.

Thanks for your good support. I will post in new thread for this problem.

I was having the same issue. The accepted answer solved it. I set the parameters j and d to max values.

# Turn can off 
$ sudo ifconfig can0 down
# Re-configure it
$ ip link set can0 up type can bitrate 500000 sjw 127 dbitrate 2000000 dsjw 15 berr-reporting on fd on

But before doing this, I made sure that all the nodes using the CAN bus were set at the same frequency.