Orin NX SPI delay between transfers

Just running a spi transfer in an infinite loop on an Orin NX, it looks like we can’t queue up transfers any faster than ~340us

The same test on a Xavier NX (older version of jetpack with Linux 4.9) can do it a lot faster

How can we reduce the 300us delay?

We are using the nvidia devkit device tree w/ no changes to the default SPI nodes in the device trees for both the xavier and orin. We are not setting any delays intentionally

Hi akhil.veeraghanta,

Are you using the devkit or custom board for Orin NX and Xavier NX respectively?

Since they are using the different kernel release, could you verify the status for Xavier NX with K5.10?

What’s your SPI device used in your test? Or it could also be reproduced in the loopback test?
What frequency do you configure for SPI?

They are both on the devkit

There is no spidevice, we are just transmitting and measure chipselect and clock. We configure SPI at 20MHZ

Could you verify the status for Xavier NX with the latest JP5.1.2 - R35.4.1which is using K5.10?
so that we could clarify if the issue is coming from the platform or the kernel release.

seems to be the kernel release

I can’t use 35.4.1 yet, but there don’t seem to be any SPI changes at initial glance

polling mode seems to crash in 35.3 as well

[   69.825998] spi-tegra114 3210000.spi: interrupt raised in polling mode                                                                                                             
[   69.826258] spi-tegra114 3210000.spi: interrupt raised in polling mode                                                                                                             
[   69.826505] spi-tegra114 3210000.spi: interrupt raised in polling mode                                                                                                             
[   69.826775] spi-tegra114 3210000.spi: interrupt raised in polling mode                                                                                                             
[   69.892561] CPU:0, Error: cbb-fabric@0x13a00000, irq=25                                                                                                                            
[   69.892710] **************************************                                      
[   69.892852] CPU:0, Error:cbb-fabric, Errmon:2                                           
[   69.892982]    Error Code            : TIMEOUT_ERR                                      
[   69.893090]    Overflow              : Multiple TIMEOUT_ERR                                                                                                                        

[   69.893277]    Error Code            : TIMEOUT_ERR                                      
[   69.893398]    MASTER_ID             : CCPLEX                                           
[   69.893494]    Address               : 0x3210014                                        
[   69.893602]    Cache                 : 0x1 -- Bufferable                                                                                                                           
[   69.893716]    Protection            : 0x2 -- Unprivileged, Non-Secure, Data Access                                                                                                
[   69.893904]    Access_Type           : Read                                             
[   69.893997]    Access_ID             : 0x10                                             
[   69.893999]    Fabric                : cbb-fabric                                       
[   69.894203]    Slave_Id              : 0x3b                                             
[   69.894288]    Burst_length          : 0x0                                              
[   69.894382]    Burst_type            : 0x1                                              
[   69.894471]    Beat_size             : 0x2                                              
[   69.894552]    VQC                   : 0x0                                              
[   69.894712]    GRPSEC                : 0x7e                                             
[   69.895173]    FALCONSEC             : 0x0                                              
[   69.895633]  **************************************                                     
[   69.896373] ------------[ cut here ]------------                                        

We blindly copied the spi-tegra114.c file from 4.9 into 5.10 and that brought the timing to be a lot better on the orin.

1 Like

From your result, it seems relating to the driver in these two kernel release.
Does you SPI work as expected with K5.10 after you use the SPI driver from K4.9?

Yes it does, can this be fixed by NVIDIA in 5.10?

We don’t want to be limited in future upgrades when some of the API in 4.9 is deprecated

It is the difference between the kernel release.
We won’t reverse the kernel driver back to old release.
I’m not quite sure if the issue is caused from some new feature in K5.10 so that the performance worse than K4.9.

Yeah I was hoping you guys can fix the newer release given that its reasily reproducable

Okay, l’ll check this issue with internal and update to you once there’s any result.

Hi, @akhil.veeraghanta, @KevinFFF
I follow your method in r35.4.1 for my agx xavier devkit with preempt_rt patch. But I encounter the following error in the compiling kernel process:

/home/***/nvidia/r3541agx_rt/Linux_for_Tegra/source/public/kernel/kernel-5.10/drivers/spi/spi-tegra114.c: 在函数‘tegra_spi_init_dma_param’中:
/home/***/nvidia/r3541agx_rt/Linux_for_Tegra/source/public/kernel/kernel-5.10/drivers/spi/spi-tegra114.c:772:13: 错误: implicit declaration of function ‘dma_request_slave_channel_reason’; did you mean ‘dma_request_slave_channel_compat’? [-Werror=implicit-function-declaration]
  772 |  dma_chan = dma_request_slave_channel_reason(tspi->dev,
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |             dma_request_slave_channel_compat
/home/***/nvidia/r3541agx_rt/Linux_for_Tegra/source/public/kernel/kernel-5.10/drivers/spi/spi-tegra114.c:772:11: 错误: assignment to ‘struct dma_chan *’ from ‘int’ makes pointer from integer without a cast [-Werror=int-conversion]
  772 |  dma_chan = dma_request_slave_channel_reason(tspi->dev,
      |           ^
make[3]: *** [/home/***/nvidia/r3541agx_rt/Linux_for_Tegra/source/public/kernel/kernel-5.10/scripts/Makefile.build:281:drivers/spi/spi-tegra114.o] 错误 1
make[2]: *** [/home/***/nvidia/r3541agx_rt/Linux_for_Tegra/source/public/kernel/kernel-5.10/scripts/Makefile.build:498:drivers/spi] 错误 2

So, I would like to know in which version did you successfully apply this method? r35.3.1 or r35.4.1? Have you encountered any of the above issues? Have you applied preempt_ rt patch?
I found that the function dma_request_slave_channel_reason is defined at dmaengine.h in K4.9:
#define dma_request_slave_channel_reason(dev, name) dma_request_chan(dev, name)
But I didn’t find this definition in K5.10.
Can I copy this definition to dmaengine.h in K5.10.

We just grabbed the signature from 5.10 and replaced it, it was just that one line you have to fix and the rest of it builds

Unfortunately I don’t have the file handy to send you, but if you compare the two files side by side (4.9 and 5.10), you can grab the line in 5.10 and copy it over to the 4.9 file and it works

1 Like

Hi @akhil.veeraghanta,

Do you still have issue with the time gap between SPI packet on K5.10?

Hi @KevinFFF,
Following the above solution. I can get more stable SPI transfer time in K5.10, such as about 350us transfer time with 128 byte SPI data.
This basically meets our application needs. But I am not sure if there is any further optimization space. Do your team have any new solutions?

Could you help to apply the following patch on K5.10 driver and compare the result?

--- a/drivers/spi/spi-tegra114.c
+++ b/drivers/spi/spi-tegra114.c
@@ -174,6 +174,7 @@
 #define SPI_FATAL_INTR_ALL_EN_0			(0x1fUL << 25)
+#define AUTOSUSPEND_TIMEOUT			300 /* in millisec */
 struct tegra_spi_soc_data {
 	bool has_intr_mask_reg;
@@ -1348,7 +1349,8 @@ static int tegra_spi_setup(struct spi_device *spi)
 		tegra_spi_set_cmd2(spi, spi->max_speed_hz);
 	spin_unlock_irqrestore(&tspi->lock, flags);
-	pm_runtime_put(tspi->dev);
+	pm_runtime_mark_last_busy(tspi->dev);
+	pm_runtime_put_autosuspend(tspi->dev);
 	return 0;
@@ -1405,7 +1407,8 @@ static  int tegra_spi_cs_low(struct spi_device *spi, bool state)
 	spin_unlock_irqrestore(&tspi->lock, flags);
-	pm_runtime_put(tspi->dev);
+	pm_runtime_mark_last_busy(tspi->dev);
+	pm_runtime_put_autosuspend(tspi->dev);
 	return 0;
@@ -1959,7 +1962,8 @@ static int tegra_spi_probe(struct platform_device *pdev)
 			goto exit_tx_dma_free;
+	pm_runtime_set_autosuspend_delay(&pdev->dev, AUTOSUSPEND_TIMEOUT);
+	pm_runtime_use_autosuspend(&pdev->dev);
 	if (!pm_runtime_enabled(&pdev->dev)) {
 		ret = tegra_spi_runtime_resume(&pdev->dev);
@@ -1988,7 +1992,8 @@ static int tegra_spi_probe(struct platform_device *pdev)
 	tspi->spi_cs_timing1 = tegra_spi_readl(tspi, SPI_CS_TIMING1);
 	tspi->spi_cs_timing2 = tegra_spi_readl(tspi, SPI_CS_TIMING2);
 	tspi->command2_reg = tegra_spi_readl(tspi, SPI_COMMAND2);
-	pm_runtime_put(&pdev->dev);
+	pm_runtime_mark_last_busy(&pdev->dev);
+	pm_runtime_put_autosuspend(&pdev->dev);
 	ret = request_threaded_irq(tspi->irq, tegra_spi_isr,
 				   tegra_spi_isr_thread, IRQF_ONESHOT,
 				   dev_name(&pdev->dev), tspi);
@@ -2015,6 +2020,7 @@ static int tegra_spi_probe(struct platform_device *pdev)
 	if (tspi->clock_always_on)
+	pm_runtime_dont_use_autosuspend(&pdev->dev);
 	tegra_spi_deinit_dma_param(tspi, false);
@@ -2086,7 +2092,8 @@ static int tegra_spi_resume(struct device *dev)
 	tspi->last_used_cs = ctrl->num_chipselect + 1;
-	pm_runtime_put(dev);
+	pm_runtime_mark_last_busy(dev);
+	pm_runtime_put_autosuspend(dev);
 	return spi_controller_resume(ctrl);

I will test this patch in my devkit.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.