Random can bus initialization error on the jetson TX2 4GB module

Hello,

I am using Jetson TX2 4GB since 2 years. So far, I have used about 50 modules and recently on new modules I have problems with CAN communication.

My setup :
Jetson TX2 modules with connect tech quasar mother board. The can transceiver is on the quasar board and we add a 120 ohm resistor.
L4T release : # R32 (release), REVISION: 2.1, GCID: 16294929, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 13 04:45:36 UTC 2019
Can ECU with another 120 ohm resistor.

Can configuration :
The CAN is configured when the JTX2 module is booted with a rc.local script. The script :

modprobe can
modprobe can_raw
modprobe mttcan
ip link set can0 type can bitrate 250000 sjw 4 loopback off restart-ms 1000
sleep 1
ifconfig can0 up

So far, I have never had a problem with the CAN communication. Everything worked perfectly. In March, I received 26 new JTX2 modules and for some of them, when starting up, the CAN bus configuration does not work. It is necessary to restart the module to hope to have a good configuration. On the other hand, when the communication initializes well, it works perfectly and is very stable. The problem seems to be completely random.

Here is the console message (with dmesg) when the initialization failed :

[   10.744138] can: controller area network core (rev 20120528 abi 9)
[   10.744850] NET: Registered protocol family 29
[   10.761540] can: raw protocol (rev 20120528)
[   10.779520] CAN device driver interface
[   10.801803] 	 Message RAM Configuration
               	| base addr   |0x0c312000|
               	| sidfc_flssa |0x00000000|
               	| xidfc_flesa |0x00000040|
               	| rxf0c_f0sa  |0x000000c0|
               	| rxf1c_f1sa  |0x000009c0|
               	| rxbc_rbsa   |0x000009c0|
               	| txefc_efsa  |0x000009c0|
               	| txbc_tbsa   |0x00000a40|
               	| tmc_tmsa    |0x00000ec0|
[   10.801982] Release 3.2.0 from 19.12.2014
[   10.802341] net can0: mttcan device registered (regs=ffffff8008024000, irq=387)
[   10.806685] 	 Message RAM Configuration
               	| base addr   |0x0c322000|
               	| sidfc_flssa |0x00000000|
               	| xidfc_flesa |0x00000040|
               	| rxf0c_f0sa  |0x000000c0|
               	| rxf1c_f1sa  |0x000009c0|
               	| rxbc_rbsa   |0x000009c0|
               	| txefc_efsa  |0x000009c0|
               	| txbc_tbsa   |0x00000a40|
               	| tmc_tmsa    |0x00000ec0|
[   10.806864] Release 3.2.0 from 19.12.2014
[   10.807311] net can1: mttcan device registered (regs=ffffff800a4e1000, irq=388)
[   10.814313] mttcan c310000.mttcan can0: Bitrate set
[   11.832780] mttcan_controller_config: ctrlmode 0
[   11.832812] mttcan c310000.mttcan can0: Bitrate set
[   12.275320] mttcan c310000.mttcan can0: entered error warning state
[   12.281774] mttcan c310000.mttcan can0: entered error passive state

and the CAN link details ans statistics :

$ ip -details -statistics link show can0
7: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0 
    can state ERROR-PASSIVE (berr-counter tx 0 rx 127) restart-ms 1000 
	  bitrate 250000 sample-point 0.875 
	  tq 25 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 4
	  mttcan: tseg1 2..255 tseg2 0..127 sjw 1..127 brp 1..511 brp-inc 1
	  mttcan: dtseg1 1..31 dtseg2 0..15 dsjw 1..15 dbrp 1..15 dbrp-inc 1
	  clock 40000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          1          1          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    RX: bytes  packets  errors  dropped overrun mcast   
    16         2        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    0          0        0       0       0       0    

For the same module, when the initialization works :

[    6.729004] can: controller area network core (rev 20120528 abi 9)
[    6.730891] NET: Registered protocol family 29
[    6.737004] can: raw protocol (rev 20120528)
[    6.750243] CAN device driver interface
[    6.757261] Adding 327256k swap on /dev/zram0.  Priority:5 extents:1 across:327256k SS
[    6.759900] zram1: detected capacity change from 0 to 335114240
[    6.781547] 	 Message RAM Configuration
               	| base addr   |0x0c312000|
               	| sidfc_flssa |0x00000000|
               	| xidfc_flesa |0x00000040|
               	| rxf0c_f0sa  |0x000000c0|
               	| rxf1c_f1sa  |0x000009c0|
               	| rxbc_rbsa   |0x000009c0|
               	| txefc_efsa  |0x000009c0|
               	| txbc_tbsa   |0x00000a40|
               	| tmc_tmsa    |0x00000ec0|
[    6.781719] Release 3.2.0 from 19.12.2014
[    6.782088] net can0: mttcan device registered (regs=ffffff8008026000, irq=387)
[    6.790250] 	 Message RAM Configuration
               	| base addr   |0x0c322000|
               	| sidfc_flssa |0x00000000|
               	| xidfc_flesa |0x00000040|
               	| rxf0c_f0sa  |0x000000c0|
               	| rxf1c_f1sa  |0x000009c0|
               	| rxbc_rbsa   |0x000009c0|
               	| txefc_efsa  |0x000009c0|
               	| txbc_tbsa   |0x00000a40|
               	| tmc_tmsa    |0x00000ec0|
[    6.790480] Release 3.2.0 from 19.12.2014
[    6.790960] net can1: mttcan device registered (regs=ffffff800a1bd000, irq=388)
[    6.797075] mttcan c310000.mttcan can0: Bitrate set
[    6.816003] Adding 327256k swap on /dev/zram1.  Priority:5 extents:1 across:327256k SS
[    6.838691] zram2: detected capacity change from 0 to 335114240
[    6.862387] Adding 327256k swap on /dev/zram2.  Priority:5 extents:1 across:327256k SS
[    6.873756] zram3: detected capacity change from 0 to 335114240
[    6.908520] Adding 327256k swap on /dev/zram3.  Priority:5 extents:1 across:327256k SS
[    6.910977] zram4: detected capacity change from 0 to 335114240
[    6.923284] Adding 327256k swap on /dev/zram4.  Priority:5 extents:1 across:327256k SS
[    6.931756] zram5: detected capacity change from 0 to 335114240
[    6.960417] Adding 327256k swap on /dev/zram5.  Priority:5 extents:1 across:327256k SS
[    7.817117] mttcan_controller_config: ctrlmode 0
[    7.817150] mttcan c310000.mttcan can0: Bitrate set

and the CAN link details ans statistics :

$ ip -details -statistics link show can0
7: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0 
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000 
	  bitrate 250000 sample-point 0.875 
	  tq 25 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 4
	  mttcan: tseg1 2..255 tseg2 0..127 sjw 1..127 brp 1..511 brp-inc 1
	  mttcan: dtseg1 1..31 dtseg2 0..15 dsjw 1..15 dbrp 1..15 dbrp-inc 1
	  clock 40000000
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    RX: bytes  packets  errors  dropped overrun mcast   
    45304      5663     0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    9792       1224     0       0       0       0 

I did many different tests before I found that the Jetson TX2 module was the problem. I tested my system with :

  • 3 different ECU
  • 3 different cables
  • 2 different mother board
  • 3 different power supply (in case of perturbation)
    I also try to re-flash JTX2 4 GB module multiple time and clone the system from a working module to a none working module.

In addition, I also tested to reconfigure the CAN communication if it’s failed but it does not work. The only thing we can do, it is to reboot the module.

Over the 26 new JTX2 4GB modules I received in march, I test 8 modules. Over the 8 modules, I found 3 modules with the problem but there are probably more over the 26. The serial number of the 3 modules begin with “142082205xxxx”.

After all these tests, some modules initialize correctly all the time and others have initialization problems (The problem occurs between 30% and 50% of the time). So I’m pretty sure that the problem is hardware and comes from the Jetson TX2 4 GB modules.

Maybe I can make some other test if someone have an idea ?
Have you had similar feedback from other users?
Did you have production problems on some series?

Regards,
Frédéric

Thanks for reporting this issue.
So from your experiment, this is not related with JetPack/L4T SW version, right? Which JetPack/L4T SW version you used?

Hi, have you done cross tests like fail module + good carrier board? If some of modules can re-pro the issue stably, it might be the module issue then, you can send them RMA for FA.

Thanks for the answers.
I use Jetpack version 4.2.3.
Yes, I have done several hardware cross tests and the error always occurs with the same Jetson TX2 modules. I’m pretty sure it’s a module issue. I will contact the RMA team.
Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.