Spi-tegra114 patches for JetPack 6.x

I’m trying to port our application from Jetpack 5 to Jetpack 6 and run into some issues relating to SPI. In JP 5.x, spi-tegra114.c contains some patches so support extra configuration fields such as clk-delay-between-packets and cs-setup-clk-count. In Jetpack 6, it looks as if those are not supported anymore. I found this thread where somebody investigated the changes and found that there are many patches missing from JP 6. @KevinFFF suggests that we port the driver from JP 5 to JP 6 ourselves. However, without being intimidate with kernel development and the SPI subsystem, this looks like a lot of effort.

Can we expect that spi-tegra114 in JP 6 is brought on par with the JP 5 version anytime soon? If not, are there other suggestions how to make SPI communication work with JP 6 for our application. For reference, this is an excerpt of the JP 5 DTS:

                spi@0 {
                        compatible = "linux,spidev";
                        reg = <0x00>;
                        spi-max-frequency = <0x2faf080>;

                        controller-data {
                                nvidia,enable-hw-based-cs;
                                nvidia,clk-delay-between-packets = <2>;
                                nvidia,cs-setup-clk-count = <3>;
                        };
                };

Those settings work fine in JP 5. They obviously don’t in JP 6. clk-delay-between-packets and cs-setup-clk-count not being read is probably part of the issue, but there may be more.

Hi ahans123,

Are you using the devkit or custom board for Orin Nano?

The SPI driver in JP6 is from upstream so that some custom features have been removed.
For the CS related issue in SPI driver of JP6, please apply the patch in Jetson orin nano SPI Speed not changing - #9 by KevinFFF for the fix.
If you have further questions about using SPI in JP6, please let us know and share the detailed reproduce steps.

Hi Kevin, thanks for your reply!

Are you using the devkit or custom board for Orin Nano?

Eventually it is supposed to run with a custom board (and with JP 5 this already works fine), but for testing purposes I can use a devkit. I’ve started to look at the SPI pins using a logic analyzer to be better able to figure out what’s going on. The custom board uses the same pins for SPI as the devkit, so I can probe the pins on the 40-pin header for debugging.

For the CS related issue in SPI driver of JP6, please apply the patch in Jetson orin nano SPI Speed not changing - #9 by KevinFFF for the fix.

Indeed, the CS pin behaves very much differently! Would be great if this was the only change necessary. I will give it a go! But from before, it looked as if the device we interface via SPI requires very specific timing, hence the clk-delay-between-packets setting. But maybe it works with the stock settings of the JP 6 driver as well!

If you need some features existing in JP5 driver but removed in JP6 driver.
You can try to port them back manually. We don’t have plan to add those feature back since the driver is aligned in upstream and we only update the change to fix the error.
So, if you have any error with JP6 SPI driver, please let us know.
If you have requirement for some features/functions in SPI, please port them manually.

Hi KevinFFF,

the CS patch did help, with that I was able to successfully decode SPI data captured with the logic analyzer. Another issue was with the byte order. This change fixed that. However, it looks like we’re not done yet. I am able to interact with our peripheral now, but the data that is sent back has the wrong byte order. It seems as if a similar patch that is needed for the sending side is also required for the receiving side. Any hint on what needs to be changed? In the meantime, I will do some more digging myself.

Nevermind my comment about the byte order of received data, this seems to be fine. However, I’m missing the first (16 bit) word of the received message. Everything that comes afterwards is as expected. Any idea what could be wrong?

Could you elaborate for this or share the waveform for the missing byte?
Is there the SCLK and CS asserted at that moment?

Have you tried to perform SPI loopback test and check if there’s the missing bytes?
Or the issue is specific to your SPI device?

Hi Kevin, sorry it took me so long to come back to you. Everything is working now. I still want to give some more details in case others face a similar issue and also wanted to give some feedback.

Looking at the waveform when connected to the SPI sub was not easily possible, since the device is soldered onto a PCB and attaching probe wires is tricky. Only measuring “open loop” without a device attached was viable. To test communication with the SPI sub device, I was using a version of spidev_test.c that a colleague had modified. I was about to do a loopback test with the unmodified spidev_test.c, when I first looked at the waveform generated by the original spidev_test.c as opposed to the modified one. The one from the original was as I would have expected, with very short delays between 16-bit words (fraction of a µs), whereas the modified one had delays of more than 20 µs that were also quite variable. Turns out the issue was that the modified version sent multiple messages with one word each, whereas the unmodified spidev_test.c sends a single multi-word message. I believe the long delays I saw were due to context switches, while a single (short) message is sent to the SPI controller in one go. In JP 5.x this worked by chance. The kernel in JP 6.x includes some scheduler changes that lead to somewhat different timing, causing the approach to fail. Once I adopted the one-message-with-multiple-words approach also for my test application (modified version of spidev_test.c), things worked fine.

Some changes to the driver were still necessary, though. I even ported the JP 5.x version at some point. It’s actually quite simple, the only required adaptation is fixing the places where xfer->delay_usecs is read (in my case it was fine to just remove them, since xfer->delay_usecs was 0 anyway). Due to the timing issue outlined above it still didn’t work, obviously.

I needed the CS select patch mentioned elsewhere:

@@ -871,6 +883,9 @@
 				command1 &= ~SPI_CS_SW_VAL;
 		}
 
+		tspi->use_hw_based_cs = true;
+		command1 &= ~(SPI_CS_SW_HW | SPI_CS_SW_VAL);
+
 		if (!tspi->prod_list) {
 			if (tspi->last_used_cs != spi->chip_select) {
 				if (cdata && cdata->tx_clk_tap_delay)

I also needed the patch to fix the byte order:

@@ -1448,7 +1468,7 @@
 	tspi->def_command1_reg  = tegra_spi_readl(tspi, SPI_COMMAND1);
 	tspi->def_command1_reg |= SPI_CS_SEL(tspi->def_chip_select);
 	tegra_spi_writel(tspi, tspi->def_command1_reg, SPI_COMMAND1);
-	tspi->def_command1_reg  = SPI_M_S;
+	tspi->def_command1_reg  = SPI_M_S | SPI_LSBYTE_FE;
 	tegra_spi_writel(tspi, tspi->def_command1_reg, SPI_COMMAND1);
 	tspi->spi_cs_timing1 = tegra_spi_readl(tspi, SPI_CS_TIMING1);
 	tspi->spi_cs_timing2 = tegra_spi_readl(tspi, SPI_CS_TIMING2);

But that was still not enough for successful communication, since the SPI sub device requires that CS is toggled between each 16-bit word. That functionality cannot be configured in the JP 6.x driver, although it is (at least partly) there. There is a function tegra_spi_set_hw_cs_timing that is made available to the rest of the SPI subsystem via master->set_cs_timing, but that is actually never called. I fixed it now by just calling it myself:

@@ -932,6 +947,8 @@
 	dev_dbg(tspi->dev, "The def 0x%x and written 0x%x\n",
 		tspi->def_command1_reg, (unsigned)command1);
 
+	tegra_spi_set_hw_cs_timing(spi);
+
 	ret = tegra_spi_flush_fifos(tspi);
 	if (ret < 0)
 		return ret;

Unfortunately, tegra_spi_set_hw_cs_timing also needs some changes to behave as it did in JP 5.x:

@@ -765,15 +772,20 @@
 
 	setup_dly = min_t(u8, setup_dly, MAX_SETUP_HOLD_CYCLES);
 	hold_dly = min_t(u8, hold_dly, MAX_SETUP_HOLD_CYCLES);
-	if (setup_dly && hold_dly) {
-		setup_hold = SPI_SETUP_HOLD(setup_dly - 1, hold_dly - 1);
-		spi_cs_timing = SPI_CS_SETUP_HOLD(tspi->spi_cs_timing1,
-						  spi->chip_select,
-						  setup_hold);
-		if (tspi->spi_cs_timing1 != spi_cs_timing) {
-			tspi->spi_cs_timing1 = spi_cs_timing;
-			tegra_spi_writel(tspi, spi_cs_timing, SPI_CS_TIMING1);
-		}
+
+	if (setup_dly > 0)
+		setup_dly--;
+
+	if (hold_dly > 0)
+		hold_dly--;
+	
+	setup_hold = SPI_SETUP_HOLD(setup_dly, hold_dly);
+	spi_cs_timing = SPI_CS_SETUP_HOLD(tspi->spi_cs_timing1,
+					  spi->chip_select,
+					  setup_hold);
+	if (tspi->spi_cs_timing1 != spi_cs_timing) {
+		tspi->spi_cs_timing1 = spi_cs_timing;
+		tegra_spi_writel(tspi, spi_cs_timing, SPI_CS_TIMING1);
 	}
 
 	inactive_cycles = min_t(u8, inactive_dly, MAX_INACTIVE_CYCLES);

Additional code would also be necessary to make it configurable via device tree, but I just hard coded everything, that is good enough for our usecase:

@@ -750,6 +750,13 @@
 	u32 inactive_cycles;
 	u8 cs_state;
 
+	setup->unit = 2;
+	setup->value = 2;
+	hold->unit = 2;
+	hold->value = 0;
+	inactive->unit = 2;
+	inactive->value = 2;
+
 	if ((setup && setup->unit != SPI_DELAY_UNIT_SCK) ||
 	    (hold && hold->unit != SPI_DELAY_UNIT_SCK) ||
 	    (inactive && inactive->unit != SPI_DELAY_UNIT_SCK)) {

Also thanks for your comment on that other topic regarding the GPIO pins. That helps me understand what was going on, although there I was eventually able to figure out what needed to be done. That topic is closed unfortunately, so I can’t reply there.