We have a 1Mpbs CAN bus that has been working fine until now. On this CAN bus there are different nodes sending messages without problem. However, after connecting the Nvidia system to it we are facing a continuous CAN error when a specific CAN frame is sent by one of the nodes.
Here are some images showing the problem:
On the above images can be seen the frame that Nvidia system rejects. This is:
• ID: 0x200.
• DLC: 8 bytes
• Data: 05 00 00 FD 7F FF 7F 01
If we change the ID to 0x199 there are no errors:
In addition, if we maintain the ID but the byte before the error bit position is changed, there are no errors neither:
Notice that the data bytes sent are 05 00 00 FD 7F 55 7F 01 instead of 05 00 00 FD 7F FF 7F 01.
Here are the details about our Nvidia AGX setup:
• The DB9 CAN connector 1&2 is being used trough pins 1 and 8.
• We are using CAN socket interface configured to 1Mbps bit rate.
• The Nvidia does not send any CAN data message.
Here are the observation we got after some tests:
No matter which node sends the problematic CAN message, the result is the same.
The error is still present if only the problematic CAN message is sent on the bus.
If we disconnect the Nvidia from the CAN bus the errors disappear.
No matter on which bus port the Nvidia is connected to.
We have tried different CAN bit time configuration on both Nvdia and the node that sends the message with no results.
Do you know why this is happening and how can be solved?
Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”): Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
Have you seen any suspicious messages on Xavier for the specific CAN frame?
Could you help to simplify the reproducing steps? Is it possible to reproduce by sending the specific CAN frame from the host system to the target system?
After realizing about the CAN error due to an specific CAN frame, we disconnected all the nodes from the bus but the NVIDIA system and then we just connect a pc with a Vector CAN tool for sending the problematic CAN message from there. So on the bus there were only a Vector CAN tool and the NVIDIA.
By sending the CAN message from the Vector CANalyzer software we got the same result. In fact, the images from the original post were taken when doing this test.
Regarding the last question, about sending the specific CAN frame from the host system to the target system, do you mean sending it from one nvidia CAN connector to another nvidia CAN connector? If so, we have not tried it with this specific message, we will do it tomorrow.
@VickNV, so far we have not used any command to see suspicious messages on Xavier. Next day we will check the kernel log using dmesg.
@rborad, we used the next command to change the CAN bit timming on Xavier:
ip link set canX type can tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
The only thing is that bit timming parameters change to default values if you reboot the Nvidia.
Here I share what we did last day:
We have tried the following set up to simplify the reproduction of the error:
-Loopback connection between CAN2 and CAN6 ports (can0 and can1 from linux), with a short cable and termination resistors (we have tried with one 120 ohm resistor or two 120ohm resistors in paralell as in the picture).
-We set both can bus to 1MHz: (sudo ip link set canX type can bitrate 1000000)
-candump can1 (visualize messages recived in can1)
-cangen can0 (send random msgs on can0)
-After some short time sending random messages, can0 goes to BUS-OFF. The errors are not totally random, because when you find a specific frame given failure (like the one mentioned on the first post), sending it causes failure almost every time, while if you continuously send a message that is okey, it “never” breaks.
-Same tests with bus bitrate of 500000 works perfectly well.
Any idea on what can be the problem and how to solve it?
Please help to simplify the reproduction steps by modifying How to Test CAN.
According to your last post, it looks you only changed “bitrate” from 500000 to 1000000 and “cansend can0 220##150” to send a specific frame (as you mentioned in your first post), right? If you can list out your steps as the document page, we can easily discuss with the team. Thanks.
So far we have the CAN bus working at 500kbps without problems. To do that we changed the CAN transceiver of the node that was sending the initial problematic CAN frame (since was causing the Nvidia system to send some CAN errors). Somehow it seems the Nvidia is not compatible with some old CAN transceiver, because all the nodes were working correctly before adding the Nvidia to the bus.
Maybe we will try to reproduce the issue using DRIVE OS 5.2.6 in the next few weeks. If so, we will let you know.