Low Latency SPI Driver

I am currently attempting to drive an SPI slave peripheral from Jetson TX2.

The issue is that the SPI slave has some particular requirements, for which the use of neither the available spidev driver, nor direct control of the tegra186-spi driver seems to satisfy.

In particular, the SPI is a Motorola SPI bus, with 32 bit wide words, separated with CS high for at least 3 SPI clock cycles between words, and is to be driven at 25MHz, as in the diagram at SPI Timing - Album on Imgur

In addition, the use case for this peripheral is for the transfer of around 20 words full duplex, a transfer to be repeated at a frequency of 30kHz. The received data is to be buffered into memory for later processing in larger batches in userspace.

So far an attempt has been made to utilise spidev (which is called from userspace, so is very much too slow), and a new driver, calling the ‘spi_sync’ function from the spi driver, lying above the tegra186-spi driver. The spidriver seems to be ignoring hardware controlled CS, and requiring separate calls to the driver for distinct words, despite a DTS as follows:

spi@3240000 {
		status = "okay";
    spi@0 {
      compatible = "spidev";
      reg = <0x0>;
      spi-max-frequency = <0x1312D00>;
      nvidia,enable-hw-based-cs;
      nvidia,cs-setup-clk-count = <0x2>;
      nvidia,cs-hold-clk-count = <0x3>;
      nvidia,rx-clk-tap-delay = <0x0>;
      nvidia,tx-clk-tap-delay = <0x0>;
        };
	};

The gap then between each transmitted word by the driver is 100’s of ms, instead of the required handful of 25MHz clock cycles between adjacent words. Even if the the lack of hardware cs were to be resolved these 100’s of ms breaks the of microseconds max response time between bursts.

Does anyone have or know of a driver solution or otherwise that would better meet the latency requirements?

hello samuel_g,

could you please refer to Topic 1043203 to update your device tree. have a try to add the polling-mode and disable-runtime-pm
thanks

I have updated the device tree, which has resulted in a significant improvement, with the latency between CS going low and the data being transferred now being on the order of 20us, instead of 10’s of ms.

However, there are still issues achieving the results desired in the question’s original description. A major culprit may be that H/W CS control is not being correctly driven.

Specifically, for the example of 3 x 32 bit words we can attempt to utilise the driver in two ways. These are demonstrated below with links to logic analyser recordings,

  1. One call of spi_sync with a 12 byte transfer, which looks like this:

This is missing inter-packet CS-pulses.

  1. Three calls of spi_sync with a 4 byte transfer.

This will allow us the inter-CS pulses, but is even slower.

Our requirement is a fast transfer, with CS pulses between the packets.

Is it possible there is a known issue with the nvidia,enable-hw-based-cs option? Or is another explanation known?