SPI slave can only receive the second packet from Nano dev kit SPI master

Hi guys,

The MCU serving as a SPI slave is tested with a stm32 board. With SPI chip select to clock delay (CS delay for short) about 10ms on SPI master before transferring packet, the slave can receive data and ack data as required. But when SPI master switch to Nano dev kit,the MCU spi receives nothing except the second packet.
For example,

void transfer(int fd, unchar const *tx, unchar const *rx, size_t len)
    int ret = 0;
	static unchar bits = 8;
	static uint speed = 1000000;
	static ushort delay = 1000;

    struct spi_ioc_transfer tr = {
        .tx_buf = (unsigned long)tx,
        .rx_buf = (unsigned long)rx,
        .len = len,
        .delay_usecs = delay,
        .speed_hz = speed,
        .bits_per_word = bits,

    ret = ioctl(fd, SPI_IOC_MESSAGE(1), &tr);
    if (ret < 1){
        pabort("can't send spi message");
unchar def_tx[6] = {0x80, 0x80, 0x80, 0x80, 0x80, 0x80};
unchar def_rx[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
transfer(fd, def_tx, def_rx, 6);
def_tx[6] = {0x81, 0x81, 0x80, 0x80, 0x80, 0x80};
transfer(fd, def_tx, def_rx, 6);
def_tx[6] = {0x82, 0x82, 0x82, 0x82, 0x80, 0x80};
transfer(fd, def_tx, def_rx, 6)

The slave can only receive the second transfer data 0x81, 0x81, 0x80, 0x80, 0x80, 0x80. We want to try to use the same CS delay to find out the problem. How can we set the CS delay prior to every transfer beginning?

We had search the spi device tree document, and get below result relative to delay.

garret:~/nano/l4t32.3.1/kernel_src/kernel/kernel-4.9/Documentation/devicetree/bindings/spi$ ack delay
22:- fsl,spi-cs-sck-delay: a delay in nanoseconds between activating chip
24:- fsl,spi-sck-cs-delay: a delay in nanoseconds between stopping the clock
54:             fsl,spi-cs-sck-delay = <100>;
55:             fsl,spi-sck-cs-delay = <50>;
43:- nvidia,tx-clk-tap-delay: Delays the clock going out to the external device
45:- nvidia,rx-clk-tap-delay : Delays the clock coming in from the external
51:- nvidia,clk-delay-between-packets : Clock delay  between packets by keeping

But it seem the nvidia,tegra114-spi.txt have no CS delay property as what we need.

How could we settle this problem? Must we need to change the spi driver code?

nvidia,clk-delay-between-packets sets the delay between “words” which are, in your case, 8 bits. So it would set the delay between each byte in your transfer. It defaults to 0.

delay_usecs sets the delay between “messages” which is a “transfer” in the case of spidev. You currently have it set to 1000 (1ms) so increasing it to 10000 (10ms) would duplicate the stm32 board’s setup.

Hi gtj,


Actually, we want the SPI chip select to clock delay even at the first transfer, and it is not really the delay between transfers.

And I have tried to set delay_usecs to 10000 (10ms), but it not work.

In our case, even the first transfer can’t be received by stm32 spi slave.

I think I may have misunderstood…

The delay between chip select and the start of the clock is controlled by nvidia,cs-setup-clk-count and the delay after the clock stops to the raising of CS is controlled by nvidia,cs-hold-clk-count.

If you want to try my new super-duper DTB Overlay Creator to add those parameters to the device tree, check this out. It’s not documented yet but all you have to do is modify SPI_Custom.py and run “make install”.
DTBOverlayCreator.zip (25.1 KB)

Hi gtj,

Thanks, I have not tried the tool you attached above.
FYI, I attach a pic here.

Thanks again.

Yep, that’s controlled by nvidia,cs-setup-clk-count.
It’s the number of clock cycles, not absolute time. so it changes with the clock frequency.

Hi gtj,

I found the DTBOverlayCreator create a tegra210-p3448-0000-p3449-0000-b00-user-custom.dtb, but my board is tegra210-p3448-0000-p3449-0000-a02.

After boot with the tegra210-p3448-0000-p3449-0000-b00-user-custom.dtb file on my p3448-0000-p3449-0000-a02 borad, and check the relation infomation on the running system.

sercomm:/sys/devices/7000d400.spi/of_node/spi@0/controller-data$ xxd nvidia,cs-inactive-cycles
00000000: 0000 0005                                ....
sercomm:/sys/devices/7000d400.spi/of_node/spi@0/controller-data$ xxd nvidia,cs-setup-clk-count
00000000: 0000 03e8                                ....
sercomm:/sys/devices/7000d400.spi/of_node/spi@0/controller-data$ xxd nvidia,cs-hold-clk-count
00000000: 0000 0050

and I set speed_hz=1000000(1Mhz), So cs-setup-clk-count=0x03e8 means 10ms. But I the logic analyzer catch just about 18us(below pic.), far less than 10ms.

I tried to change spi0 relative dtb which I use as master, and get below result.

root@localhost:/sys/devices/7000d400.spi/of_node/spi@0/controller-data# xxd nvidia,cs-setup-clk-count
00000000: 0000 03e8                                ....
root@localhost:/sys/devices/7000d400.spi/of_node/spi@0/controller-data# xxd nvidia,cs-inactive-cycles
00000000: 0000 0005                                ....
root@localhost:/sys/devices/7000d400.spi/of_node/spi@0/controller-data# xxd nvidia,cs-hold-clk-count
00000000: 0000 0005

Could you help to change the dts of 7000d400.spi targeting to my p3448-0000-p3449-0000-a02 dev kit?

Hi gtj,

Next monday we will fly to Taipei to test the Nano module which is different with Nano dev kit, so we must to know how to change that dtb but not use the tool which targets at p3448-0000-p3449-0000-b00, right? The Nvidia spi driver binding document seem not detail the usage of it. Any other nvidia document like device tree bindings we can reference?

Hi gtj,

The CS to clock delay still can’t be changed even I give a huge value to “nvidia,cs-setup-clk-count”.

speed_hz = 1000000;(1M Hz)
sercomm:~$ xxd /sys/devices/7000d400.spi/of_node/spi@0/controller-data/nvidia,cs-setup-clk-count
00000000: 00ff f3e8

There’s a limit to the number of clock counts. IIRC it’s 16. I forget what the driver does if it’s over 16 but try 10 and see what you get.

Also, you have to disable the prod-settings and you have to use hardware chip select.

Your dts fragment should look something like…

fragment@spi0 {
	target = < &spi0 >;
	__overlay__ {
		status = "okay";
		prod-settings {
			status = "disabled";
		spi@0 {
			compatible = "spidev";
			status = "okay";
			reg = < 0x0 >;
			spi-max-frequency = < 20000000 >;
			controller-data {
				nvidia,cs-setup-clk-count = < 0x5 >;
				nvidia,cs-hold-clk-count = < 0x5 >;

Hi gti,

Thanks for your comment.

I change the dts follow your above topic and spi@7000d400 node is as below,

spi@7000d400 {
                compatible = "nvidia,tegra210-spi";
                reg = <0x0 0x7000d400 0x0 0x200>;
                interrupts = <0x0 0x3b 0x4>;
                iommus = <0x2b 0xe>;
                #address-cells = <0x1>;
                #size-cells = <0x0>;
                dmas = <0x4c 0xf 0x4c 0xf>;
                dma-names = "rx", "tx";
                nvidia,clk-parents = "pll_p", "clk_m";
                clocks = <0x21 0x29 0x21 0xf3 0x21 0xe9>;
                clock-names = "spi", "pll_p", "clk_m";
                resets = <0x21 0x29>;
                reset-names = "spi";
                status = "okay";
                linux,phandle = <0x10c>;
                phandle = <0x10c>;

                prod-settings {
                        status = "disabled";
                        #prod-cells = <0x3>;

                        prod {
                                prod = <0x4 0xfff 0x0>;

                        prod_c_flash {
                                status = "disabled";
                                prod = <0x4 0x3f 0x7>;

                        prod_c_loop {
                                status = "disabled";
                                prod = <0x4 0xfff 0x44b>;
                spi@0 {
                        compatible = "spidev";
                        status = "okay";
                        reg = <0x0>;
                        spi-max-frequency = <0x1312d00>;
                        nvidia,rx-clk-tap-delay = <0x7>;

                        controller-data {
                                nvidia,cs-inactive-cycles = <0x5>;
                                nvidia,cs-hold-clk-count = <0x5>;
                                nvidia,cs-setup-clk-count = <0xa>;

And I test “nvidia,cs-setup-clk-count” properity with value 0xa, 0x10 and 0x01, all get the same CS to clock delay as below.

Obviously, the nvidia,cs-setup-clk-count not work in my dev kit board. The whole dts file and dmesg log, please refer the attachment.
tegra210-p3448-0000-p3449-0000-a02-user-custom_0xa_origin.dts.txt (283 KB)
nano_dmesg.log (60.8 KB)

I’ll try and reproduce it.

Hi gtj,

Thanks a million.
If the limit to the number of clock counts is below 16, then it is not possible to set the CS to clock delay to 10ms.

Would you mind tell me what is your time zone? Thanks

Here are some samples of what I got…

With nvidia,cs-setup-clk-count = 5

With nvidia,cs-setup-clk-count = 1

With nvidia,cs-setup-clk-count = 16

Attached is the exact dts file I used for the 16 clock count as well as the dtbo and dtb.
I’m trying to remember if I had to do a kernel patch. I’m checking on that now.
spi.zip (44 KB)

Yeah, sorry about that. Here’s the nvidia patch that enables setting the setup and hold parameters.
It’s from 2017 so why it’s not already included I don’t know.

Apply it to the kernel and rebuild it.
spi.patch.txt (2.32 KB)

Hi gtj,

Appreciated for your reply.

Following your guide, I applied the spi.patch to kernel source of l4t 32.3.1 and 32.2.3, rebuild the kernel and copied the arch/arm64/boot/Image to /boot of Nano dev kit. But after Nano boot, it can’t detect /dev/spi* and cat /proc/devices/ | grep spi returned nothing. The spi of_node can be found /sys/devices. After switched the Image with your patch to old Image, the /dev/spidev0.0 and dev/spidev1.1 can be detected agian.

garret:~/src_share/l4t32.3.1/kernel_src/kernel/kernel-4.9$ patch -p1 < ../spi.patch
patching file drivers/spi/spi-tegra114.c
Hunk #2 succeeded at 873 (offset -2 lines).
Hunk #3 succeeded at 900 (offset -2 lines).
Hunk #4 succeeded at 1094 (offset -2 lines).
Hunk #5 succeeded at 1994 with fuzz 2 (offset -6 lines).
garret:~/src_share/l4t32.3.1/kernel_src/kernel/kernel-4.9$ export ARCH=arm64
garret:~/src_share/l4t32.3.1/kernel_src/kernel/kernel-4.9$ export CROSS_COMPILE=aarch64-linux-gnu-
garret:~/src_share/l4t32.3.1/kernel_src/kernel/kernel-4.9$ make tegra_defconfig
garret:~/src_share/l4t32.3.1/kernel_src/kernel/kernel-4.9$ make zImage
  GEN     .version
  LD      vmlinux.o
  MODPOST vmlinux.o
  CHK     include/generated/compile.h
  UPD     include/generated/compile.h
  CC      init/version.o
  LD      init/built-in.o
  KSYM    .tmp_kallsyms1.o
  KSYM    .tmp_kallsyms2.o
  LD      vmlinux
  SORTEX  vmlinux
  SYSMAP  System.map
  OBJCOPY arch/arm64/boot/Image
  GZIP    arch/arm64/boot/zImage

The steps above is what I rebuild the kernel. Could you tell me what test Linux-for-Tegra version is?

Hi gtj,

I found that the Image with no patch build with l4t 32.3.1 kernel source can’t detect /dev/spi* too, while the l4t/kernel/Image can. Dose the make tegra_defconfig have problem? so I need to change the .config file?


It needs to change the default the CONFIG_SPI_SPIDEV=m in .config CONFIG_SPI_SPIDEV=y.

The /dev/spi* can be detected now.

sercomm:~$ ls /dev/s
shm/       snd/       spidev0.0  spidev1.0  stderr     stdin      stdout

Hi gjt,

Thanks for your kind help, the CS delay can be set now.
Below is the result of “nvidia,cs-setup-clk-count = <0x5>”

But there is a limit clock count which is 16 to the “nvidia,cs-setup-clk-count” now.

Why the driver need that limit? Could you reset that limit as big as possible?

Hi gtj,

How to set bytes interval in one transfer like blow pic.?

Don’t why the interval between bytes are different(please refer pic. below) when I use ioctl(fd, SPI_IOC_MESSAGE(6), xfer) API send 6bytes with same configurations of every spi_ioc_transfer as follows.

struct spi_ioc_transfer	xfer[6]; 
for(i=0; i<6; i++)
   xfer[i].tx_buf = (unsigned long) buf;
   xfer[i].len = 1;
   //xfer[2].delay_usecs = 0;
   xfer[i].speed_hz = 1000000;
status = ioctl(fd, SPI_IOC_MESSAGE(6), &xfer);


You can’t change that limit. The control register passed to the hardware is only 4 bits wide so it’s limited to 1-16. Actually 0-15 but there’s a minimum of 1 clock cycle.