Hello, I am trying to use the MCP2515 SPI-CAN module with Orin NX with a custom carrier board from Connect Tech.
Jetpack version is 5.1.2
I have successfully completed the integration of the module. I can exchange data two-way.
Firstly, even if nothing is connected to the CAN connector of the MCP2515, interrupt counter is increasing. Here is the output of
cat /proc/interrupts | grep spi
command:
61: 3899759 0 0 0 GICv3 68 Level 3210000.spi
255: 3899753 0 0 0 gpio 25 Edge spi0.0
after 1 second:
61: 3905721 0 0 0 GICv3 68 Level 3210000.spi
255: 3905715 0 0 0 gpio 25 Edge spi0.0
And also I think this causes too much CPU usage. Here is the output of the top command: It causes cpu load between %5-20.
PID CPU
2256 root -51 0 0 0 0 D 26.3 0.0 0:58.71 irq/255-spi0.0
368 root -51 0 0 0 0 S 5.3 0.0 0:04.33 irq/61-3210000.
371 root 20 0 0 0 0 D 5.3 0.0 1:32.43 spi0
Output of lsmod | grep can
:
can_raw 28672 1
can 28672 1 can_raw
can_dev 36864 1 mcp251x
When I send data to the can bus, I can get the data from the orin NX side. But after few minutes, interrupts are disabled and I get this error message from the kernel:
kernel:[ 1434.534770] Disabling IRQ #255
And this the output of the dmesg when this error message comes:
[ 116.351714] irq 255: nobody cared (try booting with the "irqpoll" option)
[ 116.358725] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.120-tegra #1
[ 116.358729] Hardware name: Unknown CTI Hadron + Orin NX/CTI Hadron + Orin NX, BIOS 202210.3-52cefd4-dirty 10/10/2023
[ 116.358733] Call trace:
[ 116.358748] dump_backtrace+0x0/0x1d0
[ 116.358755] show_stack+0x2c/0x40
[ 116.358766] dump_stack+0xd8/0x138
[ 116.358771] __report_bad_irq+0x54/0xe0
[ 116.358778] note_interrupt+0x2d4/0x3a0
[ 116.358785] handle_irq_event_percpu+0x88/0x90
[ 116.358790] handle_irq_event+0x4c/0xf0
[ 116.358794] handle_edge_irq+0xb4/0x1c0
[ 116.358799] generic_handle_irq+0x3c/0x60
[ 116.358808] tegra186_gpio_irq+0x11c/0x1e0
[ 116.358813] generic_handle_irq+0x3c/0x60
[ 116.358818] __handle_domain_irq+0x6c/0xc0
[ 116.358821] gic_handle_irq+0x64/0x130
[ 116.358825] el1_irq+0xd0/0x180
[ 116.358833] cpuidle_enter_state+0xb4/0x400
[ 116.358837] cpuidle_enter+0x3c/0x50
[ 116.358842] call_cpuidle+0x40/0x70
[ 116.358845] do_idle+0x1fc/0x260
[ 116.358849] cpu_startup_entry+0x28/0x70
[ 116.358854] rest_init+0xd8/0xe4
[ 116.358862] arch_call_rest_init+0x14/0x1c
[ 116.358867] start_kernel+0x4f8/0x52c
[ 116.358869] handlers:
[ 116.361207] [<0000000042791afb>] irq_default_primary_handler threaded [<00000000c25d1824>] mcp251x_can_ist [mcp251x]
[ 116.372061] Disabling IRQ #255
Here is my device tree to configure mcp2515 over spi bus:
can_clock: can_clock {
compatible = "fixed-clock";
#clock-cells = <0>;
clock-frequency = <20000000>;
clock-accuracy = <100>;
};
spi@3210000{
status="okay";
can0: spi@0 { /* chip select 0 */
compatible = "microchip,mcp2515";
reg = <0x0>;
spi-max-frequency = <10000000>;
clocks = <&can_clock>;
nvidia,rx-clk-tap-delay = <0x7>;
interrupt-parent = <&tegra_aon_gpio>;
nvidia,enable-hw-based-cs;
interrupts = <TEGRA234_AON_GPIO(EE, 2) IRQ_TYPE_EDGE_FALLING>;
controller-data {
nvidia,rx-clk-tap-delay = <0x10>;
nvidia,tx-clk-tap-delay = <0x0>;
};
};
};
According to my research kernel will disable interrupts if there are too much unhandled interrupts. Does it mean that the bus speed is too much? Mcp2515 cannot handle all messages? I am using 1Mbit bitrate.
By the way, if I select IRQ_TYPE_LEVEL_LOW instead IRQ_TYPE_EDGE_FALLING, I cannot bring up the can0 interface via ip link set up command. The system freezes after this command. That’s why I have switched to IRQ_TYPE_EDGE_FALLING option.
Edit 1:
With a low can bus load, there is no error after 20 minute test. Here is the picture of can analyzer tool with low bus load:
With high speed, I got the interrupt disabled error. Here is the picture with higher bus load:
Edit 2:
If decrease the spi-max-frequency parameter in the device tree, the error happens earlier. For example with value of 1000000 error happens after 10 seconds. With 20000000 error happens after 10 minutes. I have tried to increase this frequency to 50000000 but with this value mcp is not recognized. It says cannot enter the conf mode after reset in the dmesg.
I think it is related with spi frequency and bus load but I cannot findy any corelation and how to solve it.
Thanks