MCP2515 - Interrupts Disabled

Hello, I am trying to use the MCP2515 SPI-CAN module with Orin NX with a custom carrier board from Connect Tech.

Jetpack version is 5.1.2

I have successfully completed the integration of the module. I can exchange data two-way.

Firstly, even if nothing is connected to the CAN connector of the MCP2515, interrupt counter is increasing. Here is the output of
cat /proc/interrupts | grep spi
command:

 61:    3899759          0          0          0     GICv3  68 Level     3210000.spi
255:    3899753          0          0          0      gpio  25 Edge      spi0.0

after 1 second:

 61:    3905721          0          0          0     GICv3  68 Level     3210000.spi
255:    3905715          0          0          0      gpio  25 Edge      spi0.0

And also I think this causes too much CPU usage. Here is the output of the top command: It causes cpu load between %5-20.

   PID                                            CPU
   2256 root     -51   0       0      0      0 D  26.3   0.0   0:58.71 irq/255-spi0.0                          
   368  root     -51   0       0      0      0 S   5.3   0.0   0:04.33 irq/61-3210000.                         
   371  root      20   0       0      0      0 D   5.3   0.0   1:32.43 spi0

Output of lsmod | grep can:

can_raw                28672  1
can                    28672  1 can_raw
can_dev                36864  1 mcp251x

When I send data to the can bus, I can get the data from the orin NX side. But after few minutes, interrupts are disabled and I get this error message from the kernel:
kernel:[ 1434.534770] Disabling IRQ #255

And this the output of the dmesg when this error message comes:

[  116.351714] irq 255: nobody cared (try booting with the "irqpoll" option)
[  116.358725] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.120-tegra #1
[  116.358729] Hardware name: Unknown CTI Hadron + Orin NX/CTI Hadron + Orin NX, BIOS 202210.3-52cefd4-dirty 10/10/2023
[  116.358733] Call trace:
[  116.358748]  dump_backtrace+0x0/0x1d0
[  116.358755]  show_stack+0x2c/0x40
[  116.358766]  dump_stack+0xd8/0x138
[  116.358771]  __report_bad_irq+0x54/0xe0
[  116.358778]  note_interrupt+0x2d4/0x3a0
[  116.358785]  handle_irq_event_percpu+0x88/0x90
[  116.358790]  handle_irq_event+0x4c/0xf0
[  116.358794]  handle_edge_irq+0xb4/0x1c0
[  116.358799]  generic_handle_irq+0x3c/0x60
[  116.358808]  tegra186_gpio_irq+0x11c/0x1e0
[  116.358813]  generic_handle_irq+0x3c/0x60
[  116.358818]  __handle_domain_irq+0x6c/0xc0
[  116.358821]  gic_handle_irq+0x64/0x130
[  116.358825]  el1_irq+0xd0/0x180
[  116.358833]  cpuidle_enter_state+0xb4/0x400
[  116.358837]  cpuidle_enter+0x3c/0x50
[  116.358842]  call_cpuidle+0x40/0x70
[  116.358845]  do_idle+0x1fc/0x260
[  116.358849]  cpu_startup_entry+0x28/0x70
[  116.358854]  rest_init+0xd8/0xe4
[  116.358862]  arch_call_rest_init+0x14/0x1c
[  116.358867]  start_kernel+0x4f8/0x52c
[  116.358869] handlers:
[  116.361207] [<0000000042791afb>] irq_default_primary_handler threaded [<00000000c25d1824>] mcp251x_can_ist [mcp251x]
[  116.372061] Disabling IRQ #255

Here is my device tree to configure mcp2515 over spi bus:

can_clock: can_clock {
        compatible = "fixed-clock";
        #clock-cells = <0>;
        clock-frequency = <20000000>;
        clock-accuracy = <100>;
    };

    spi@3210000{
        status="okay";

        can0: spi@0 { /* chip select 0 */
            compatible = "microchip,mcp2515";
            reg = <0x0>;
            spi-max-frequency = <10000000>;
            clocks = <&can_clock>;
            nvidia,rx-clk-tap-delay = <0x7>;
            interrupt-parent = <&tegra_aon_gpio>;
            nvidia,enable-hw-based-cs;
            interrupts = <TEGRA234_AON_GPIO(EE, 2) IRQ_TYPE_EDGE_FALLING>;
            controller-data {
                nvidia,rx-clk-tap-delay = <0x10>;
                nvidia,tx-clk-tap-delay = <0x0>;
            };
        };
    };

According to my research kernel will disable interrupts if there are too much unhandled interrupts. Does it mean that the bus speed is too much? Mcp2515 cannot handle all messages? I am using 1Mbit bitrate.

By the way, if I select IRQ_TYPE_LEVEL_LOW instead IRQ_TYPE_EDGE_FALLING, I cannot bring up the can0 interface via ip link set up command. The system freezes after this command. That’s why I have switched to IRQ_TYPE_EDGE_FALLING option.

Edit 1:
With a low can bus load, there is no error after 20 minute test. Here is the picture of can analyzer tool with low bus load:

With high speed, I got the interrupt disabled error. Here is the picture with higher bus load:

Edit 2:
If decrease the spi-max-frequency parameter in the device tree, the error happens earlier. For example with value of 1000000 error happens after 10 seconds. With 20000000 error happens after 10 minutes. I have tried to increase this frequency to 50000000 but with this value mcp is not recognized. It says cannot enter the conf mode after reset in the dmesg.

I think it is related with spi frequency and bus load but I cannot findy any corelation and how to solve it.

Thanks

Hi gok2hw,

Could you verify with the latest JP5.1.4 or JP6.1?

Could you measure the signal of this interrupt at this moment?

Please share the full dmesg when you hit the issue.
I don’t see this behavior on the devkit.
Could you also get a devkit to compare the difference?

How did you configure PEE.02 in pinmux?

In my verification for MCP2515 module, I use IRQ_TYPE_LEVEL_LOW rather than IRQ_TYPE_EDGE_FALLING.

Hello Kevin,

With the current MCP module, I cannot use IRQ_TYPE_LEVEL_LOW option. With this option ip link set can0 up command freezes. I have to re-power the board.

How did you configure PEE.02 in pinmux?

I did not configure PEE.02 explicitly.

Could you verify with the latest JP5.1.4 or JP6.1?

I have some CSI cameras connected to the system. Their drivers are for 5.1.2. That’s why I am using 5.1.2. Switching new Jetpack version is not easy for me just now.

I will examine the INT pin via oscilloscope to share with you.

By the way, I have tried with another MCP module. With this module interrupt counter is not increasing when there is no data. And I have done the same tests and there is no error. Interrupt is not disabled. And also with this module I can use IRQ_TYPE_LEVEL_LOW option in the device tree. So switching to devkit maybe unnecessary right now, since I have done this with carrier board.

Connections are same with the other, problematic MCP module. What can be the root cause of this?

Thanks

Do you mean the issue is specific to the problematic MCP2515 module?
Everything works as expected after you replace another module?
If so, I would suggest you also asking the help from your vendor since it is not developed from us.