AGX Orin spi-tegra114 c260000.spi crashed when sending large amounts of data in JetPack 6.0

Hello,
I am working with Jetson AGX Orin 32g with JetPack 6.0, and SDK version is 36.3.0.
I find that if I sending large amounts of data to c260000.spi, the spi driver crashed. But if I send just a few bytes, it works fine.

Here is the kernel crash log:

[  103.606163] slb9670@0 enforce active low on chipselect handle
[  103.606205] spi@1 enforce active low on chipselect handle
[  155.935999] ------------[ cut here ]------------
[  155.936011] WARNING: CPU: 6 PID: 2960 at drivers/spi/spi-tegra114.c:1104 tegra_spi_transfer_one_message+0x514/0x634 [spi_tegra114]
[  155.936041] Modules linked in: spi_tegra114 nvidia_modeset(OE) lzo_rle lzo_compress snd_soc_tegra210_mvc(O) snd_soc_tegra210_ope(O) snd_soc_tegra210_admaif(O) snd_soc_tegra210_mixer(O) snd_soc_tegra186_dspk(O) snd_soc_tegra210_afc(O) snd_soc_tegra_pcm snd_soc_tegra186_asrc(O) snd_soc_tegra186_arad(O) snd_soc_tegra210_dmic(O) zram snd_soc_tegra210_sfc(O) snd_soc_tegra210_i2s(O) zsmalloc snd_soc_tegra210_adx(O) snd_soc_tegra210_amx(O) snd_soc_tegra210_ahub(O) tegra210_adma tpm_tis_spi tpm_tis_core spidev nvvrs_pseq_rtc(O) joydev snd_soc_tegra_machine_driver(O) crct10dif_ce snd_soc_tegra_utils(O) snd_soc_simple_card_utils tegra234_oc_event(O) tegra23x_perf_uncore(O) mttcan(O) can_dev nvpmodel_clk_cap(O) tegra_mce(O) thermal_trip_event(O) tegra_cactmon_mc_all(O) tegra_aconnect tegra234_aon(O) ramoops reed_solomon snd_hda_codec_hdmi pwm_tegra_tachometer(O) at24 snd_hda_tegra snd_hda_codec snd_hda_core mc_hwpm(O) nvethernet(O) tegra_pcie_dma_test(O) nvpps(O) tegra_pcie_edma(O) nvidia(O)
[  155.936143]  host1x_fence(O) lm90 i2c_nvvrs11(O) nvidia_vrs_pseq(O) tegra_dce(O) rfkill nvhost_isp5(O) nvhost_vi5(O) nvhost_nvcsi_t194(O) bridge stp llc usb_f_ncm usb_f_mass_storage tegra_camera(O) v4l2_dv_timings nvhost_nvcsi(O) tegra_camera_platform(O) capture_ivc(O) usb_f_acm u_serial usb_f_rndis tegra_camera_rtcpu(O) u_ether ivc_bus(O) hsp_mailbox_client(O) libcomposite ivc_ext(O) v4l2_fwnode v4l2_async videobuf2_dma_contig videobuf2_memops governor_userspace videobuf2_v4l2 videobuf2_common videodev tegra_drm(O) nvhost_pva(O) nvhost_nvdla(O) tegra_wmark(O) mc tegra_se(O) nvhost_capture(O) crypto_engine nvhwpm(O) cec host1x_nvhost(O) tsecriscv(O) drm_kms_helper nvidia_p2p(O) ina3221 nvgpu(O) governor_pod_scaling(O) host1x(O) mc_utils(O) nvmap(O) nvsciipc(O) fuse drm ip_tables x_tables ipv6 pwm_fan(E) pwm_tegra(E) tegra_bpmp_thermal(E) tegra_xudc(E) ucsi_ccg(E) typec_ucsi(E) typec(E) nvme(E) nvme_core(E) phy_tegra194_p2u(E) pcie_tegra194(E) [last unloaded: spi_tegra114]
[  155.936256] CPU: 6 PID: 2960 Comm: python3 Tainted: G        W  OE     5.15.136-tegra #1
[  155.936262] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer Kit/Jetson, BIOS 36.3.0-gcid-36106755 04/25/2024
[  155.936265] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  155.936270] pc : tegra_spi_transfer_one_message+0x514/0x634 [spi_tegra114]
[  155.936276] lr : tegra_spi_transfer_one_message+0x1f8/0x634 [spi_tegra114]
[  155.936281] sp : ffff800018c3bac0
[  155.936283] x29: ffff800018c3bac0 x28: ffff0000b7fb1e00 x27: 0000000040400007
[  155.936289] x26: ffff0000b7fb1e00 x25: ffff0000ea55c000 x24: 000000000007a120
[  155.936293] x23: ffff0000b7fb1e00 x22: 00000000000003e8 x21: ffff0000b7fb1ee8
[  155.936297] x20: ffff800018c3bd00 x19: ffff0000c15d1000 x18: 0000000000000000
[  155.936302] x17: 000000040044ffff x16: 005000f5b5503510 x15: 0000000000000000
[  155.936306] x14: 0000000000000000 x13: 0000000000000020 x12: 0101010101010101
[  155.936311] x11: 7f7f7f7f7f7f7f7f x10: baf59b6e7fd92d3d x9 : 2e40b47ea6bd06e5
[  155.936315] x8 : ffff0000ea7d4d58 x7 : 0000000000000006 x6 : 0000000000000000
[  155.936318] x5 : 00000000410fd420 x4 : 0000000000c0000e x3 : 0000000000000000
[  155.936322] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[  155.936325] Call trace:
[  155.936327]  tegra_spi_transfer_one_message+0x514/0x634 [spi_tegra114]
[  155.936330]  __spi_pump_messages+0x384/0x7f0
[  155.936343]  __spi_sync+0x2d0/0x310
[  155.936347]  spi_sync+0x3c/0x60
[  155.936350]  spidev_message+0x390/0x520 [spidev]
[  155.936356]  spidev_ioctl+0x7b8/0xb30 [spidev]
[  155.936359]  __arm64_sys_ioctl+0xb4/0x100
[  155.936364]  invoke_syscall+0x5c/0x130
[  155.936371]  el0_svc_common.constprop.0+0x64/0x110
[  155.936374]  do_el0_svc+0x74/0xa0
[  155.936377]  el0_svc+0x28/0x80
[  155.936383]  el0t_64_sync_handler+0xa4/0x130
[  155.936386]  el0t_64_sync+0x1a4/0x1a8
[  155.936390] ---[ end trace 97fc80ace9ab14a4 ]---
[  156.941510] spi-tegra114 c260000.spi: spi transfer timeout
[  156.946530] tegra-gpcdma 2600000.dma-controller: DMA pause timed out
[  156.947346] spi_master spi2: failed to transfer one message from queue

And here is my relative dts code:

		spi@c260000 {
			status = "okay";
			cs-gpios = <&gpio_aon TEGRA234_AON_GPIO(CC, 3) GPIO_ACTIVE_HIGH>;
			num-cs = <1>;
			spi@1 { /* chips select 1 */
				compatible = "tegra-spidev";
				reg = <0x1>;
				status = "okay";
				spi-max-frequency = <10000000>;
			};
		};

My test code is here:

import spidev
import time

# create SPI object
spi = spidev.SpiDev()

# open spi device (bus 2, device 1)
spi.open(2, 1)

# config SPI parameter
spi.max_speed_hz = 500000  # config max speed
spi.mode = 0b00  # config spi mode

# prepare amounts of data
data = [0x01, 0x02, 0x03, 0x04] * 1000 

# send data
spi.xfer2(data)

# close SPI
spi.close()

I even measured the waveform of this spi with a logic analyzer. The spi clock speed is about 3.3Mhz. I also tried to make the speed to 300Khz by referring this link. But the problem is still there.

I also tried to test this SPI with JetPack 5 and it works fine.

I tried to check the driver code of spi between JP5 and JP6. There are many changes between them. And the dts compatible label also changed between them(JP5:nvidia,tegra186-spi, JP6:nvidia,tegra210-spi)

I am wondering if you have any idea about this.

Look forward to your reply~

Hi bbear953308023,

Are you using the devkit or custom board for AGX Orin?

Could you compare the parent clock used for SPI2 in both releases?

Please share how you connect it and how you run this code.

Hi KevinFFF,
Yes, I am working with a custom board.But I believe this problem should be the same with the devkit.

And I check the parent between them and they are the same. Here is the details:

# JetPack 5.1.3
root@seeed-desktop:/sys/kernel/debug/bpmp/debug/clk/spi1# cat possible_parents
pllp_out0 clk_32k clk_m
root@seeed-desktop:/sys/kernel/debug/bpmp/debug/clk/spi1# cat parent
pllp_out0
# JetPack 6.0
root@seeed-desktop:/sys/kernel/debug/bpmp/debug/clk/spi2# cat possible_parents
pllp_out0 pll_c pll_aon clk_32k osc
root@seeed-desktop:/sys/kernel/debug/bpmp/debug/clk/spi2# cat parent
pllp_out0

And the detail steps to run this test code(assuming that we save the code to testspi.py):

sudo pip3 install spidev
sudo python3 testspi.py

PS: we need to modify the spidev num in different JetPack version
In JetPack 5 the python code is spi.open(1, 0)
In JetPack 6 the python code is spi.open(2, 1)

Could you print the max_rate in both cases?

Thanks for sharing the steps to run the test.
Could you also share how you connect on SPI? with any SPI device? Or the loopback test through short MOSI/MISO?

Currently, I can only give you the max_rate of JetPack 5 cause the board with JetPack 6 is not in my hand. Here is the detail of JetPack 5:

root@seeed-desktop:~# cat /sys/kernel/debug/bpmp/debug/clk/spi1/max_rate
81600000
root@seeed-desktop:~# cat /sys/kernel/debug/bpmp/debug/clk/spi1/possible_parents
pllp_out0 clk_32k clk_m
root@seeed-desktop:~# cat /sys/kernel/debug/bpmp/debug/clk/spi1/parent
pllp_out0

I test the SPI device by connecting MOSI and MISO directly with spidev_test command, which source code is here.
spidev_test command details:

spidev_test -D /dev/spidev1.0 -v -p hello

After the basic test is OK. I disconnect the MOSI and MISO. Connect nothing to spi interface and test it with the testspi.py mentioned before.

And here is the data of max_rate in JetPack 6.0(r36.3.0):

root@nvidia-desktop:/sys/kernel/debug/bpmp/debug/clk/spi2# uname -a
Linux nvidia-desktop 5.15.136-tegra #1 SMP PREEMPT Wed Apr 24 19:36:48 PDT 2024 aarch64 aarch64 aarch64 GNU/Linux
root@nvidia-desktop:/sys/kernel/debug/bpmp/debug/clk/spi2# cat max_rate
81600000

Hi KevinFFF,
Is there any update about this problem?

Sorry that it may take time to me to verify your python script.
I will update the result on my local setup by next week.

Have you also verified with spidev_test tool?

Since I can reproduce this problem with the python script, so I didn’t try to reproduce it with spidev_test.

If you can’t reproduce it with python script. I will have a try with spidev_test.

Hi KevinFFF,
I also tried the step mentioned in this thread. But the problem is still there.

Hi KevinFFF,
I found one more thing about this problem. Testing with the python test code I mention before, I find that if I send
equal or less than 256 bytes, the program works fine as well as the logic analyzer waveform. But if I send 257 bytes with the python scripts, the spi driver crashed.

I hope this can help for your debugging work~

I tried hard to fix this problem by myself but finally failed :(
As I said before, the spi driver works fine if I send less or equal to 256 bytes. So I tried to fix this problem in my application by divide my data package into several 256 bytes package. And it works.

I hope this can help people who encountered with the same problem with me.

And I hope jetson team can fix the spi driver soon in the future.

1 Like

Hi bbear953308023,

Thanks for pointing out the 256Bytes issue.
There are some issues in SPI driver of Jetpack 6.0. We are still working on fixing them and wish it could be fixed in next Jetpack 6 release.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.