CANBUS not working

Hey guys

I’m trying to get 1mbps transfer speeds working on my Xavier AGX devkit using jetpack version 35.4.1. Here is what I have done:

Per Controller Area Network (CAN) — Jetson Linux Developer Guide documentation, I have connected the WaveShare SN65HVD230 CAN board (purchased from the Amazon site recommended). I have conducted loopback tests and verified it is working.

I followed the directions there and entered the following commands:

sudo busybox devmem 0x0c303000 32 0x0000C400
sudo busybox devmem 0x0c303008 32 0x0000C458
sudo busybox devmem 0x0c303010 32 0x0000C400
sudo busybox devmem 0x0c303018 32 0x0000C458
sudo modprobe can
sudo modprobe can_raw
sudo modprobe mttcan
sudo ip link set can0 type can bitrate 1000000
sudo ip link set can1 type can bitrate 1000000
sudo ip link set up can0
sudo ip link set up can1

When I execute
ip -s -d link show can0

I get:

10: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can <BERR-REPORTING,FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
bitrate 1010526 sample-point 0.736
tq 52 prop-seg 6 phase-seg1 7 phase-seg2 5 sjw 1
mttcan: tseg1 2…255 tseg2 0…127 sjw 1…127 brp 1…511 brp-inc 1
dbitrate 2021052 dsample-point 0.736
dtq 26 dprop-seg 6 dphase-seg1 7 dphase-seg2 5 dsjw 1
mttcan: dtseg1 1…31 dtseg2 0…15 dsjw 1…15 dbrp 1…15 dbrp-inc 1
clock 38400000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
0 0 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0

The bitrate is off by about 1%. This is to be expected because I haven’t followed Clocks — Jetson Linux Developer Guide documentation instructions yet. So I do that. I have successfully sent some data over canbus to another MCU, but the bitrate difference winds up causing too many errors.

I use dtc to create and edit
~/nvidia/nvidia_sdk/JetPack_5.1.2_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/kernel/dtb/tegra194-p2888-0001-p2822-0000.dts
and
~/nvidia/nvidia_sdk/JetPack_5.1.2_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/bootloader/t186ref/tegra194-a02-bpmp-p2888-a04.dts

For tegra194-p2888-0001-p2822-0000.dts I edit the clocks-init section to look like this:

clocks-init {
                compatible = "nvidia,clocks-config";
                status = "okay";

                disable {
                        clocks = <0x04 0x09 0x04 0x0b>;
                };
        };

and make the edits to pll_source, clocks, and clock-names in mttcan@c310000 and c320000 as shown below:

mttcan@c310000 {
                compatible = "nvidia,tegra194-mttcan";
                reg = <0x00 0xc310000 0x00 0x144 0x00 0xc311000 0x00 0x32 0x00 0xc312000 0x00 0x1000>;
                reg-names = "can-regs\0glue-regs\0msg-ram";
                interrupts = <0x00 0x28 0x04>;
                pll_source = "pllaon";
                clocks = <0x04 0x11c 0x04 0x0a 0x04 0x09 0x04 0x5b 0x04 0x5e>;
                clock-names = "can_core\0can_host\0can\0pllaon";

mttcan@c320000 {
                compatible = "nvidia,tegra194-mttcan";
                reg = <0x00 0xc320000 0x00 0x144 0x00 0xc321000 0x00 0x32 0x00 0xc322000 0x00 0x1000>;
                reg-names = "can-regs\0glue-regs\0msg-ram";
                interrupts = <0x00 0x2a 0x04>;
                pll_source = "pllaon";
                clocks = <0x04 0x11d 0x04 0x0c 0x04 0x0b 0x04 0x5b 0x04 0x5e>;
                clock-names = "can_core\0can_host\0can\0pllaon";

I’m not clear if I entered the variables correctly; the instructions and forum posts I found seemed to either not explain outright what to enter or apply to older Jetpack versions.

For the bpmp file, I edited the two clock@can* sections to be as follows. Again, I’m not certain I edited them correctly:

clock@can1 {
                        allow_fractional_divider = <0x01>;
                        allowed-parents = <0x121 0x5b 0x13a 0x5e>;
                        clk-id = <0x09>;

clock@can2 {
                        allow_fractional_divider = <0x01>;
                        allowed-parents = <0x121 0x5b 0x13a 0x5e>;
                        clk-id = <0x0b>;
                };

After usting dtc to convert both dts files to their corresponding dtb files and replacing the old dtb files, I checked my /boot/extlinux/extlinux.conf, and commented out the FDT entry. It looks like this (did not copy the commented sections):

TIMEOUT 30
DEFAULT primary

MENU TITLE L4T boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
#      FDT /boot/dtb/kernel_tegra194-p2888-0001-p2822-0000.dtb
      INITRD /boot/initrd
      APPEND ${cbootargs} root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 mminit_loglevel=4 console=ttyTCU0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 rootfstype=ext4 video=efifb:off nospectre_bhb nv-auto-config

Then I use the flash.sh script as follows:

sudo ./flash.sh -r -k bpmp-fw-dtb jetson-agx-xavier-devkit mmcblk0p1 
sudo ./flash.sh -r -k kernel-dtb jetson-agx-xavier-devkit mmcblk0p1

I do not want to do a clean install / flash because I have a fair amount of other configurations in my development environment and would rather not spend the hours reinstalling them.

After flashing bpmp it reboots, so I put it back into recovery mode and do the kernel-dtb
At this point, nothing has changed. The clock is the same, and cat /sys/kernel/debug/bpmp/debug/clk/can1/parent still gives me osc

Trying to configure for higher TDCR bitrates per the ControllerAreaNetworkGuide doesn’t work when I enter echo 0x600 > /sys/devices/c320000.mttcan/net/can1/tdcr because /sys/devices doesn’t have c3*0000 devices. For what it’s worth, attempted to check the clock rate by sudo cat /sys/kernel/debug/bpmp/debug/clk/can0/pto_counter doesn’t work because there’s no file there.

Some sources I have read:
https://docs.nvidia.com/jetson/archives/r35.2.1/DeveloperGuide/text/HR/ControllerAreaNetworkCan.html#managing-the-network
https://docs.nvidia.com/jetson/archives/r35.4.1/DeveloperGuide/text/SD/Clocks.html#sd-clocks-configuringclocks

https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3275/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/flashing.html#wwpID0E0TQ0HA
https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3275/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/flashing.html#wwpID0E0TQ0HA

I’m not sure what to do from here. Any help would be appreciated.

Hi dbrownrxvs0,

Have you verified loopback test for CAN as Jetson AGX Xavier Developer Kit CAN commmunication error - #8 by KevinFFF before you connect CAN with your MCU? Or you just add loopback on for internal loopback test?

Could you try to use echo to configure the parent clock in runtime to test?

  1. Loopback on. I have used that mcu and the same SN65HVD230 successfully with other mcu’s on a 1mbps bus
  2. Which command are you wanting me to echo?

loopback on is used for internal loopback test w/o any wiring externally.

Please try running the following command

# cat /sys/kernel/debug/bpmp/debug/clk/can1/possible_parents
# pllp_out0 > /sys/kernel/debug/bpmp/debug/clk/can1/parent
# cat /sys/kernel/debug/bpmp/debug/clk/can1/rate

cat /sys/kernel/debug/bpmp/debug/clk/can1/possible_parents

clk_32k osc pll_c pll_aon

pllp_out0 > /sys/kernel/debug/bpmp/debug/clk/can1/parent
This command when executed with or without sudo results in:

-bash: /sys/kernel/debug/bpmp/debug/clk/can1/parent: Permission denied

The redirect itself isn’t being executed as sudo, I believe, just the echo. So I executed:

echo pllp_out0 | sudo tee /sys/kernel/debug/bpmp/debug/clk/can1/parent

I then ran

cat /sys/kernel/debug/bpmp/debug/clk/can1/parent

and that still returns

osc

I’m assuming since pllp_out0 isn’t a possible parent, that’s why .

cat /sys/kernel/debug/bpmp/debug/clk/can1/rate

38400000

Also, my purpose in explaining that I’ve used the SN65HVD230 and MCU’s elsewhere is that they are known good hardware.

Now, I can echo pll_aon | sudo tee /sys/kernel/debug/bpmp/debug/clk/can1/parent and when I run sudo cat /sys/kernel/debug/bpmp/debug/clk/can1/parent I get

pll_aon

And sudo cat /sys/kernel/debug/bpmp/debug/clk/can1/rate gets me

200000000

I executed rmmod mttcan and modprobe mttcan, then reset can0 (sudo ip link set down can0, sudo ip link set can0 type can bitrate 1000000, sudo ip link set up can0) and now it looks like I’m good?

ip -s -d link show can0 shows (clipped):

10: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.740
tq 20 prop-seg 18 phase-seg1 18 phase-seg2 13 sjw 1
mttcan: tseg1 2…255 tseg2 0…127 sjw 1…127 brp 1…511 brp-inc 1
mttcan: dtseg1 1…31 dtseg2 0…15 dsjw 1…15 dbrp 1…15 dbrp-inc 1
clock 50000000

The bitrate looks good. The clock however is showing a different number (50m vs 20m). I read that 50m is the default? Is that right?

Also, any thoughts on why isn’t pllaon loading automatically at boot?

But… that’s not the end. Now I try to do some tests. I have an MCU connected to can0 sending some packets on a loop. can1 has nothing connected to it, so I start with testing it first. I execute:

cangen can1 -I 555 -D 1122334455667788 -g 10
ip -s -d link show can1 and I get

11: can1: <NO-CARRIER,NOARP,UP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can state BUS-OFF (berr-counter tx 248 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.740
tq 20 prop-seg 18 phase-seg1 18 phase-seg2 13 sjw 1
mttcan: tseg1 2…255 tseg2 0…127 sjw 1…127 brp 1…511 brp-inc 1
mttcan: dtseg1 1…31 dtseg2 0…15 dsjw 1…15 dbrp 1…15 dbrp-inc 1
clock 50000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 1 1 1 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
24 3 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0

I took can1 down, brought it back up, re-ran the two commands again and I get:

write: No buffer space available

For can0, on the Jetson I executed a program I’ve written that generates can messages. It generates the frames fine but hangs at the socket transmit. Concurrently it is receiving messages the MCU I mentioned above. When I execute cangen can0 -I 555 -D 1122334455667788 -g 10 I also get:

write: No buffer space available

For can0, typing ip -s -d link show can0 gets:

10: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can state ERROR-PASSIVE (berr-counter tx 0 rx 124) restart-ms 0
bitrate 1000000 sample-point 0.740
tq 20 prop-seg 18 phase-seg1 18 phase-seg2 13 sjw 1
mttcan: tseg1 2…255 tseg2 0…127 sjw 1…127 brp 1…511 brp-inc 1
mttcan: dtseg1 1…31 dtseg2 0…15 dsjw 1…15 dbrp 1…15 dbrp-inc 1
clock 50000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 1 1 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
16 2 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 16 0 0

I’m at a loss right now - any suggestions?

echo and cat are used in runtime for testing.
You have to modify BPMP-DTB to apply the change so that they would be loaded during boot up.

Please try if the following patch can help for your case.
AGX Orin: CAN bus does not automatic recover from BUS_OFF state - #26 by KevinFFF

1 Like

I modified the BPMP-DTB and flashed. The problems persisted. I haven’t had time to tinker with that today.

As for AGX Orin: CAN bus does not automatic recover from BUS_OFF state - #26 by KevinFFF , I rebuilt the kernel object for mttcan per Kernel Customization — Jetson Linux Developer Guide documentation and replaced it. This did not solve the problem.

Have you confirmed that you modified the correct DTB and you have re-flashed the board to apply the change?

I edited these two files:

~/nvidia/nvidia_sdk/JetPack_5.1.2_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/kernel/dtb/tegra194-p2888-0001-p2822-0000.dts

and

~/nvidia/nvidia_sdk/JetPack_5.1.2_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/bootloader/t186ref/tegra194-a02-bpmp-p2888-a04.dts

Today I reformatted a separate AGX Xavier development environment. On this one I put a fresh install of 35.5.0. The bug in not getting out of BUS-OFF from 5.1.2 is gone as expected, however these issues are replicating on both:

  1. the BPMP change is not being recognized. I still have to manually load pll_aon.
  2. Also, on both of them, if I execute

cangen can0 -I 555 -D 1122334455667788 -g 10

I show three packets on RX and 24 bytes, then it goes to BUS-OFF. On Jetpack 5.1.3 I can put the interface down and up and do it again. On 5.1.2 I have to use modprobe to restart it. This problem exists on Can0 and Can1. Can0 is properly terminated. I’ve even done this test with both of them connected to eachother with tranceisvers and twisted pair wire. An example of what I see:

10: can0: <NO-CARRIER,NOARP,UP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can state BUS-OFF (berr-counter tx 248 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.740
tq 20 prop-seg 18 phase-seg1 18 phase-seg2 13 sjw 1
mttcan: tseg1 2…255 tseg2 0…127 sjw 1…127 brp 1…511 brp-inc 1
mttcan: dtseg1 1…31 dtseg2 0…15 dsjw 1…15 dbrp 1…15 dbrp-inc 1
clock 50000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 1 1 1 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
RX: bytes packets errors dropped overrun mcast
24 3 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0

One question that may matter: On Clocks — NVIDIA Jetson Linux Developer Guide 1 documentation it says to

Edit the kernel DTS file to make CAN use PLLAON as parent and remove PLLAON (entry 0x4 0x5e) from the list of clocks to be disabled:

However, the actual Kernel DTS entries are as follows:

clocks-init {
compatible = “nvidia,clocks-config”;
status = “okay”;

            disable {
                    clocks = <0x2ee 0x5e 0x04 0x09 0x04 0x0b>;
            };
    };

Am I supposed to delete 0x2ee 0x5e? I’m confused on this one.

Please advise on all issues.

I would suggest updating the device tree from the source rather than the one decompile from dtb.

Could you share the full flash log for further check?