SPI CS issue

Hi Guys,

I’m experiencing a weird problem with SPI.0 using spidev.

Here’s a couple of points:

  1. My kernel is R24 (i cannot use the latest Jetpack as all my custom drivers break)
  2. My SPI clock rate is set to 9.5MHz (this works fine)
  3. I’m using hardware CS

What I observe is:

CS pulls low (enables) - as expected
data is clocked out @ 9.5MHz - as expected
CS stays low for additional 31uS before it goes high - not expected!

My device-tree is setup like this:

spi0_0 {
#address-cells = <0x1>;
#size-cells = <0x0>;
compatible = “spidev”;
reg = <0x0>;
spi-max-frequency = <0xf4240>;
nvidia,enable-hw-based-cs;
nvidia,cs-setup-clk-count = <0x1e>;
nvidia,cs-hold-clk-count = <0x1e>;
nvidia,rx-clk-tap-delay = <0x1f>;
nvidia,tx-clk-tap-delay = <0x0>;
};

Obviously the delays spring out as interesting so I tried changing all of them to 0x0 but this didn’t do anything.

Then I looked in the reference manual section 37.3.3 SPI_TIMING_REG1_0 and that says that cs-setup and cs-hold cannot be larger than 16 so the initial setting of 0x1E seems totally weird and must be a bug!

Further more in the SPI driver there’s this line of code:
hold_count = min(cdata->cs_hold_clk_cnt,16)

That corresponds well with the reference manual but then it’s still weird that default value is 0x1e

Regardless I have been unable to resolve this problem now for a while so any pointers would be appreciated!

Thanks
Lasse

@LasseRoedtnes

  1. Could you print the cs_setup_clk_count, cs_hold_clk_count in tegra_spi_start_transfer_one() in the spi driver to make sure it apply from your modification.

Hi Shane,
I’m working with Lasse on this issue. I added a printk to read those two values in tegra_spi_start_transfer_one (after correcting the DTS which was wrong in the example I used), and I read the values I put in. But regardless of what I write in the register (0 is my default), CS always lags.
Is this a hardware bug?

Hi Shane,

SPI_TIMING_REG_1_0 is read and confirmed to be zero always (it doesn’t change)

This is a typical transfer:
see attachment: normal_spidev.png

We tried changing the SPI_TIMING_REG_1_0 to be 0x1111111 (set 1 cycle delay on all CS’s) but notice how the delay goes crazy now and more than doubles:
see attachment: timingreg10.png

Obviously we didn’t want to continue down this path so we set it back to zero (as it was)

Then we noticed SPI_COMMAND_2 was set to 20h so we tried to set that to zero which helped:
see attachment: command2.png

But as you can see there’s still a huge delay…

We are at a loss here and have exhausted the different options in the reference manual we can think off…
Can you please advice what’s going on?
command2.png


@Lasse
Could you try this patch

diff --git a/drivers/spi/spi-tegra114.c b/drivers/spi/spi-tegra114.c
index 11f1338..2601cc9 100644
--- a/drivers/spi/spi-tegra114.c
+++ b/drivers/spi/spi-tegra114.c
@@ -84,8 +84,8 @@
 #define SPI_RX_TAP_DELAY(x)                    (((x) & 0x3F) << 0)

 #define SPI_CS_TIMING1                         0x008
-#define SPI_SETUP_HOLD(setup, hold)            (((setup - 1) << 4) |   \
-                                               (hold - 1))
+#define SPI_SETUP_HOLD(setup, hold)            (((setup) << 4) |       \
+                                               (hold))
 #define SPI_CS_SETUP_HOLD(reg, cs, val)                        \
                ((((val) & 0xFFu) << ((cs) * 8)) |      \
                ((reg) & ~(0xFFu << ((cs) * 8))))
@@ -911,7 +911,7 @@ static void tegra_spi_set_timing2(struct spi_device *spi)
 {
        struct tegra_spi_data *tspi = spi_master_get_devdata(spi->master);
        struct tegra_spi_client_ctl_data *cdata = spi->controller_data;
-       u32 spi_cs_timing2 = 0;
+       u32 spi_cs_timing2 = tspi->spi_cs_timing2;

        if (!cdata || tspi->prod_list)
                return;
@@ -1109,9 +1109,7 @@ static u32 tegra_spi_setup_transfer_one(struct spi_device *spi,
                                tegra_spi_writel(tspi, command1, SPI_COMMAND1);

                tspi->is_hw_based_cs = false;
-               if (cdata && cdata->is_hw_based_cs && is_single_xfer &&
-                   ((tspi->curr_dma_words * tspi->bytes_per_word) ==
-                    (t->len - tspi->cur_pos))) {
+               if (cdata && cdata->is_hw_based_cs && is_single_xfer) {
                        tegra_spi_set_timing1(spi);
                        tspi->is_hw_based_cs = true;
                }
@@ -2016,6 +2014,8 @@ static int tegra_spi_probe(struct platform_device *pdev)
        tspi->def_command1_reg |= SPI_CS_SEL(tspi->def_chip_select);
        tegra_spi_writel(tspi, tspi->def_command1_reg, SPI_COMMAND1);
        tspi->command2_reg = tegra_spi_readl(tspi, SPI_COMMAND2);
+       tspi->spi_cs_timing = tegra_spi_readl(tspi, SPI_CS_TIMING1);
+       tspi->spi_cs_timing2 = tegra_spi_readl(tspi, SPI_CS_TIMING2);
        tegra_spi_set_slcg(tspi);
        pm_runtime_put(&pdev->dev);

Hi LasseRoedtnes,

Have you tried the provided patch? Does it work? Any result can be shared?

Thanks

Sorry, I was out of the office and I could only test the patch this morning. But it will not apply to kernel 3.10, so I really cannot test it. Also the driver in 3.10 seems to already contain some of the modifications.

one workaround you might consider with SPI is clocking polarities.

if it ramps up too slowly, it may ramp down much more quickly.

in other words if you did this:
CS pulls HIGH(enables) - as expected
data is clocked out @ 9.5MHz - as expected
CS switches LOW for additional X uS

in other words, SPI has 4 modes, sclk can be low at idle or high at idle depending on the mode you use.

(sorry i’m coming from STM32 world, i’m just starting with TX1’s so i’m not sure yet if they support the 4 standard spi modes. just trying to give you an alternative idea)
see below:
http://www.rosseeld.be/DRO/PIC/SPI_Timing.htm

Hi Randy,

What you’re describing is SPI polarity and phase of data related to clock edge and sadly it has nothing to do with the problem I’m seeing.

My SPI slave device dictates the mode I have to use and regardless of what mode that may have been the behavior of the CS pin is not related to this as it’s only phase and polarity of clock vs. data.

So the problem remains I’m afraid. :-/

Best regards
Lasse

@LasseRoedtnes
Please have a try this patch for r24.2.1

diff --git a/drivers/spi/spi-tegra114.c b/drivers/spi/spi-tegra114.c
index 015e943..c7debce 100644
--- a/drivers/spi/spi-tegra114.c
+++ b/drivers/spi/spi-tegra114.c
@@ -242,6 +242,7 @@ struct tegra_spi_data {
        u32                                     def_command1_reg;
        u32                                     def_command2_reg;
        u32                                     spi_cs_timing;
+       u32                                     spi_cs_timing2;
        u8                                      def_chip_select;
 
        struct completion                       xfer_completion;
@@ -974,9 +975,7 @@ static int tegra_spi_start_transfer_one(struct spi_device *spi,
 
                /* possibly use the hw based chip select */
                tspi->is_hw_based_cs = false;
-               if (cdata && cdata->is_hw_based_cs && is_single_xfer &&
-                       ((tspi->curr_dma_words * tspi->bytes_per_word) ==
-                                               (t->len - tspi->cur_pos))) {
+               if (cdata && cdata->is_hw_based_cs && is_single_xfer) {
                        u32 set_count;
                        u32 hold_count;
                        u32 spi_cs_timing;
@@ -990,36 +989,43 @@ static int tegra_spi_start_transfer_one(struct spi_device *spi,
                        if (hold_count)
                                hold_count--;
 
-                       spi_cs_setup = SPI_SETUP_HOLD(set_count,
-                                       hold_count);
-                       spi_cs_timing = tspi->spi_cs_timing;
-                       spi_cs_timing = SPI_CS_SETUP_HOLD(spi_cs_timing,
-                                               spi->chip_select,
-                                               spi_cs_setup);
-                       tspi->spi_cs_timing = spi_cs_timing;
-                       tegra_spi_writel(tspi, spi_cs_timing,
-                                               SPI_CS_TIMING1);
+                       spi_cs_setup = SPI_SETUP_HOLD(set_count, hold_count);
+                       spi_cs_timing = SPI_CS_SETUP_HOLD(tspi->spi_cs_timing,
+                                                         spi->chip_select,
+                                                         spi_cs_setup);
+                       if (tspi->spi_cs_timing != spi_cs_timing) {
+                               tspi->spi_cs_timing = spi_cs_timing;
+                               tegra_spi_writel(tspi, spi_cs_timing, SPI_CS_TIMING1);
+                       }
                        tspi->is_hw_based_cs = true;
                }
 
-               if (cdata && cdata->cs_inactive_cycles) {
+               if (cdata->cs_inactive_cycles) {
                        u32 inactive_cycles;
 
                        SPI_SET_CS_ACTIVE_BETWEEN_PACKETS(spi_cs_timing2,
-                                               spi->chip_select,
-                                               0);
+                                                         spi->chip_select,
+                                                         0);
                        inactive_cycles = min(cdata->cs_inactive_cycles, 32);
                        SPI_SET_CYCLES_BETWEEN_PACKETS(spi_cs_timing2,
-                                               spi->chip_select,
-                                               inactive_cycles);
-                       tegra_spi_writel(tspi, spi_cs_timing2, SPI_CS_TIMING2);
+                                                      spi->chip_select,
+                                                      inactive_cycles);
+                       if (tspi->spi_cs_timing2 != spi_cs_timing2) {
+                               tspi->spi_cs_timing2 = spi_cs_timing2;
+                               tegra_spi_writel(tspi, spi_cs_timing2,
+                                                SPI_CS_TIMING2);
+                       }
                        tspi->is_hw_based_cs = true;
                } else {
                        SPI_SET_CS_ACTIVE_BETWEEN_PACKETS(spi_cs_timing2,
-                                               spi->chip_select, 1);
+                                                         spi->chip_select, 1);
                        SPI_SET_CYCLES_BETWEEN_PACKETS(spi_cs_timing2,
-                                               spi->chip_select, 0);
-                       tegra_spi_writel(tspi, spi_cs_timing2, SPI_CS_TIMING2);
+                                                      spi->chip_select, 0);
+                       if (tspi->spi_cs_timing2 != spi_cs_timing2) {
+                               tspi->spi_cs_timing2 = spi_cs_timing2;
+                               tegra_spi_writel(tspi, spi_cs_timing2,
+                                                SPI_CS_TIMING2);
+                       }
                }
 
                if (!tspi->is_hw_based_cs) {
@@ -1972,6 +1978,8 @@ static int tegra_spi_probe(struct platform_device *pdev)
        tspi->def_command1_reg |= SPI_CS_SEL(tspi->def_chip_select);
        tegra_spi_writel(tspi, tspi->def_command1_reg, SPI_COMMAND1);
        tspi->def_command2_reg = tegra_spi_readl(tspi, SPI_COMMAND2);
+       tspi->spi_cs_timing = tegra_spi_readl(tspi, SPI_COMMAND1);
+       tspi->spi_cs_timing2 = tegra_spi_readl(tspi, SPI_COMMAND2);
        tegra_spi_runtime_put(tspi);
 
        ret = devm_request_irq(&pdev->dev, tspi->irq, tegra_spi_isr, 0,

The patch does not apply on Kernel 3.10, which version do you use? (I applied the hunks by hand). Also, should it read COMMAND1 and 2 into cs_timing?

tegra_spi_writel(tspi, tspi->def_command1_reg, SPI_COMMAND1);
        tspi->def_command2_reg = tegra_spi_readl(tspi, SPI_COMMAND2);
+       tspi->spi_cs_timing = tegra_spi_readl(tspi, SPI_COMMAND1);
+       tspi->spi_cs_timing2 = tegra_spi_readl(tspi, SPI_COMMAND2);

In the other patch it is like this

+       tspi->spi_cs_timing = tegra_spi_readl(tspi, SPI_CS_TIMING1);
+       tspi->spi_cs_timing2 = tegra_spi_readl(tspi, SPI_CS_TIMING2);

Hi Shane,

I think we are getting in the right direction!

https://logmar.dk/picture1.jpg

As you can see on the traces above the CS is now functional, however, there’s now a 81.96uS delay from the last data bit is clocked out (top yellow trace) to the data signal goes high (you can also see this on the bottom histogram) - I cannot understand how this is possible as the data line is not supposed to change after CS goes high.

The biggest problem though is the massive 850uS delay between SPI transactions/packets and I’m transmitting thousands of packets across the SPI to my LCD so you see it updating each line which is obviously not acceptable.

Here’s a zoom out that shows the packet (and time to next packet) better:
https://www.logmar.dk/zoomout.png

Hoping we can get the final mile :-)

Thanks
Lasse

@LaseRoedtnes
Could you help to add this to you DT to check.

spi1@3210000 {
                prod-settings {
                        prod {
                                prod = <
                                        0x0 0x73fff83f 0x43c01807
                                        0xc 0x0000003f 0x00000020>;
                        };
                        prod_c_cs1 {
                                prod = <
                                        0x4 0x00000fff 0x00000008
                                        0x8 0x0000ff00 0x00000000>;
                        };
                };
        };

you can try use this dtb settings, the width of cs cut to 28.63us in my tx2.

spi@0 {
			reg = <0x0>;
			nvidia,tx-clk-tap-delau = <0x0>;
			compatible = "spidev";
			nvidia,cs-setup-clk-count = <0x1e>;
			nvidia,cs-hold-clk-count = <0x1e>;
			spi-max-frequency = <0x1312d00>;
			nvidia,enable-hw-based-cs;
			nvidia,cs-inactive-cycles = <0x01>;
			nvidia,rx-clk-tap-delay = <0x1f>;
		};