CAN Interfacing Timing or Signal Issue

Hardware Platform: DriveWorks AGX Xavier
Software Version: Drive Software 2.2
Host Machine Version: Ubuntu 18.04.4 LTS (Bionic Beaver)
SDK Manager Version: 1.1.0.6343

Hello,

My IMUs do not work with XavierA CAN0/1. Earlier I created a post which kind of started with what seems like a termination issue or filtering issue, it didnt solve my problem. Thank you for your patience and help so far. That said, I still need help to get CAN interface to work on XavierA. I have some new information, which is as follows:

  • It looks like either Physical layer issue or timing issue.
  • IMU with ExtendedID has frame error almost always on the later part and retries sending previous message because what seemed like it dint receive ACK.
  • Same IMU works with 11-bit id, doesn’t work with 29-bit id. Filter settings are default and according to Bosch Mttcan User Manual, by default Extended ID messages are put in RxBuffer along with StandardID messages from Figure 7.
  • I check termination from no termination on the bus to termination on each node, issue still persisted.
  1. Do you also think that it might be a timing issue from what i’ve explained?
  2. What is the prescaler value on CAN clock and what is system clock(120MHz?) and CAN clock speed(50MHz?)?
  3. Would it help configuring prog_seg, phase_seg1, phase_seg2 parameters on SocketCAN?
  4. In section 10.1.2.1 in this document: Xavier_TRM_DP09253002_v1.4p.pdf I found that it talks about setting BTP and nTQ. Is this related? if yes, how and what values to configure?

Please advise what else can i try or how to narrow down the problem? let me know if you need more information.

Thanks in advance. Looking forward to your response.

Regards,
Rishit

How do i read and update DBTP register in Bosch MTTCAN? Again, most likely my problem has to do with timing, I found different configuration for Bit timing here but i am not sure based on what information should i pick one of many configuration suggested. Also I am not able to alter prop-seg, phase-seg1,phase-seg2 parameters from socket can parameters. They do not change.

Thanks in advance.

Hi @rborad,

Have you seen any error messages from it? Does “Test FD CAN” in CAN Driver documentation page work well on your side?

test non-FD works well if my IMU is not connected. I can communicate to XaviarA using CAN1 socket when CAN4&6 connector is connected to my CANUSB running on Linux Host PC.

Also, as i learned more, I think IMU is resending packets not because its not receiving ACK but because it detects Stuff error on the bus and trying to resend the packet. I think this is the case because VectorCAN detects stuff error around bit80-bit90. Looking at the oscilloscope, timing wise IMU sends a packet 330uS long and retires. In ideal case, IMU send 500uS long packet. The timing (330uS) corresponds to the bit error position in the packet of approx. 128bit.

Thanks for your reply.

Regards,
Rishit

Is the issue only in CAN FD? Are there any error messages on Xavier A side? How about “Test FD CAN” without the IMU?

@VickNV,

I dont know the IMUs support CAN FD. Most likely they dont if i understand it right. I think i tried testing that sequence once before but i got some error. I will try it again today and keep you posted. My problem is on non-FD side though.

Thanks
Rishit

I asked about “Test FD CAN” just because you said your issue is on 29-bit id.
You may not really need to test it if your issue isn’t with CAN FD.

Could you elaborate your steps? Have you candump’ed and received the packets?
If you can share any error messages on Xavier A side, it will be easiler for us to check if any experience about your issue.

Hello @VickNV,

Happy New year.

Thanks for replying. I’ve previously shared Error message log here. They are not any different in this case.

I think i have a hypothesis of what is happening. Which is as follows:

  1. IMU sends Extended Message on the CAN bus
  2. XaviarA detects Stuff error so it sends out Error Frame on the bus
  3. All the devices on the bus sees the Error Frame (from XavierA) on the bus and discards the message received from the IMU.
  4. IMU receives this Error frame and tries to resend the message.

From step 4, it goes back to step 1.

Evidence:
I attached an oscilloscope on the bus and found the following.

  • Successful message transmission takes approximately 503uS for each message from the IMU. (Picture 1 and 2)
  • Length of message varies depending on when the DriveAGX finds the stuffing error. When Stuffing error is found on bit 85, width of message is ~330uS, when Stuffing error is found on bit 80, width of the message is ~315uS. and so on.
    CAN messages are around 128 bit long (varies depending on bit-stuffing). In below case stuff error is at bit 120. So the message length is very close to the message length when transmission is successful. (Picture 3&4)

What I said above is explained better in general here. Checkout First Answer in this post.

Probable Cause:

IMU works fine when the XavierA is not on the bus.
We do not have any complains about CAN communications from our Customers who uses these IMUs.
That said, I think XavierA is introducing Stuffing error because configured bit-timing on XavierA is not right.
I think fixing bit-timing on XavierA would solve this issue. I need your help to get information on how to set Bit-Timings on the XavierA.

Let me know if you have questions. Like i said, this is a hypothesis. Let me know if you think otherwise. Also, let me know if you are still interested in knowing the steps I am following. I will do it as soon as find some time.

Regards,
Rishit

Happy New Year! Have you seen below error message on Xavier side?

~/nvidia/nvidia_sdk/DRIVE_Software_10.0_Linux_OS_DDPX/DRIVEOS/drive-oss-src/nvidia/drivers/net/can/mttcan/native/m_ttcan_linux.c:547:           netdev_err(dev, "Stuff Error Detected\n");

@VickNV,

I thinks so. While you get back with more information, I will take a look at this file (m_ttcan_linux.c) and see if i find something.

Thanks
Rishit

Please provide any suspicious messages on Xavier side. It will be helpful to check if any similar issues have been reported. Thanks!

I will not have XavierA access until next Monday but the log looks like this.

[1718433.055336] mttcan c320000.mttcan can1: Format Error Detected
[1718433.057517] mttcan c320000.mttcan can1: Format Error Detected
[1718433.059065] mttcan c320000.mttcan can1: Format Error Detected
[1718433.060638] mttcan c320000.mttcan can1: Format Error Detected
[1718433.062203] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.063772] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.065318] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.066872] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.068426] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.074106] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.075687] mttcan c320000.mttcan can1: Format Error Detected
[1718433.077274] mttcan c320000.mttcan can1: Format Error Detected
[1718433.078838] mttcan c320000.mttcan can1: Format Error Detected
[1718433.080407] mttcan c320000.mttcan can1: Format Error Detected
[1718433.081967] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.083524] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.085082] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.086674] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.088218] mttcan c320000.mttcan can1: Stuff Error Detected
[1718433.214826] mttcan c320000.mttcan can1: Stuff Error Detected

Not Sure if this is what you asked for. This is all i get on XavierA. If it helps, also sharing ip -details -statistics below:

8: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
link/can promiscuity 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
bitrate 250000 sample-point 0.875
tq 20 prop-seg 87 phase-seg1 87 phase-seg2 25 sjw 1
mttcan: tseg1 2…255 tseg2 0…127 sjw 1…127 brp 1…511 brp-inc 1
mttcan: dtseg1 1…31 dtseg2 0…15 dsjw 1…15 dbrp 1…15 dbrp-inc 1
clock 50000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 10 0 1 1 1 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
10588 1325 10 0 0 0
TX: bytes packets errors dropped carrier collsns
4 2 0 1 0 0

These are old snapshots, I will update this to fresh ones on Monday.

Thanks
Rishit

Wasn’t this addressed in CAN Interface not behaving right on AGX - #22 by rborad with correct CAN-High (CAN+)/CAN-Low (CAN-) pins connected?

Is there any way to reproduce the issue with Extended Message ID without the specific IMU?

@VickNV,
Thanks for quick responses.

I copied this log from there. I can reproduce this for all the IMUs i am using, whether Extended ID or not.

FYI, I dint have a good understanding of the problem back then when the thread you are pointing out was active.

In all the cases where I said Unit was working, whether it was Std ID or Ext ID. The unit sent out unchanging data packets. In other words, If you take any unit and program it to send out same data packets over and over, it would work. Which is really strange.

However, IMU data is always changing, no two consecutive packets are same. In that case, what I mentioned in my hypotheses above is observed.

Hope this makes sense.

Regards,
Rishit

Didn’t you say standard 11 bit message ID works in the first post of this topic?
Please try if able to reproduce the issue by using cansend commands to send different data packets.

Sorry for delayed response, I dint have access to the DriveAGX until now, I wanted to give you the latest log.

I did, but in my previous response I mentioned why it was working and how can i reproduce it. Let me quote it again here.

cansend works fine. I connected IMU, Sniffer & XavierA(CAN1) on the bus. Sent a CAN command form XavierA to IMU using cansend. The command was to ask IMU to send out Data packets at 100hz. IMU recognized that command and started sending data packets out on the bus. But XavierA found the error (terminal log attached) after receiving first data packet. log_010221.txt (76.5 KB). Right after this, I sent another command from XavierA to ask IMU to stop sending Data packets and IMU received it and stopped sending data packets.

XavierA can send out as many cansend packets and sniffers receive all of them all right and vice versa. Trouble begins when IMU sends out packets and XavierA is receiving. As mentioned before, I think this is happening because XavierA CAN bit timing is off and sees Stuff problem when it shouldn’t.

Conclusion,

  • IMU receives messages sent from XavierA (cansend)
  • XavierA receives limited(1) messages from IMU. Sends stuffError frame to IMU. On receiving stuffError frame, IMU retries sending same message over again.
  • XavierA receives all the messages sent from the third-party sniffer.

(1) Limited meaning, XavierA receives messages with constant data.

Let me know if this helps or you need more information.

I’m trying to clarify further. If unchanging/constant data works, why XavierA found “Stuff Error Detected” after receiving “first” data packet? What’s the difference between the first data packet and previous unchanging/constant data (tested okay)? Could you double check if the canbus pins connections between Xavier and IMU are correct? Thanks!

Thanks for the reply @VickNV .

In the log I shared, IMU is sending changing/non-const data. Thats why the StuffError. More information below.

Just to recap, when the data packets are non-static, XavierA receives a few packets before it finds stuff error. In this case, XavierA received one packet and found stuff error in the second packet. In another case , XavierA received 7 non-static packets and found the stuff error in 8th packet. It really varies how many packets XavierA receives (range is 0-12 packets) before a stuff error is found. Once a stuff error is detected, I have to reset the socket CAN interface.
To answer your question, there is no difference except the message payload is not the same.

I did. The bus and all the nodes on the bus works fine when XavierA is not connected to the bus. I’ve tried as many as 2 IMUs and 2 different sniffers on the bus at the same time (4 nodes in total) and the everything works great. I’ve also eliminated termination issue possibilities by adding terminations on the nodes incrementally (normally i have two 120 Ohms resistors on the bus). But no luck.

Let me know if you need more information. Is it possible to setup a call to layout basics and behavior and then someone or you can help out solve this issue? I am trying get the answer here for a while now.

Regards,
Rishit

Usually “Stuff Error Detected” happens as a result of poor or improper CAN termination.

What’s the simplest connection (only the IMU and DRIVE system) to reproduce the issue with non static frame (but static frame is working)? Could you share its diagram for me to check with internal team? Thanks.

Hello @VickNV

Please find the diagram attached. Also the question is, is poor CAN termination the only cause for “stuff error”?

Also picture of the back of the connector where termination is connected.

Improper CAN termination is the bigger culprit here but there is another one, “bit timing”. What if the CAN phy is not finding the stuffed bit because the sampling time is off and it samples before the stuffed bit appears on the bus? Is that a possibility? You dint answer anything about bit timing yet.

Based on above information, If you can answer following two questions. How to change the bit timing on Xavier A? Does it need changing at all?

I know there is one way to change the bit timing. its when you are configuring the socket-can. But that is not working with XavierA. You can find details here in section 6.5.2.
Another way is the kernel route. I couldn’t find where the bit timing is being set in CAN related .c files.

Sorry, i really need to figure this out soon. I will try to reply as soon as i can.

Thanks for help so far.
Rishit