Touchscreen stops working after couple touches

Hello,
I’ve got a Jetson Nano 2GB Developer kit and a 4inch resistive waveshare touchscreen, I followed this Tutorial. And everything was working very nicely then I had to update to a newer version of Jetpack for another part of the project and I followed the tutorial again. With some changes I finally got the touchscreen to work but then after a few touches it stopped working. A restart fixes it but after a couple touches it stops working again. dmesg --follow shows this when the problem occurs:

[   77.112793] Unable to handle kernel read from unreadable memory at virtual address 00000000
[   77.121177] Mem abort info:
[   77.123961]   ESR = 0x96000005
[   77.127006]   Exception class = DABT (current EL), IL = 32 bits
[   77.132910]   SET = 0, FnV = 0
[   77.135955]   EA = 0, S1PTW = 0
[   77.139084] Data abort info:
[   77.141955]   ISV = 0, ISS = 0x00000005
[   77.145777]   CM = 0, WnR = 0
[   77.148736] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffc0d1282000
[   77.155247] [0000000000000000] *pgd=0000000000000000, *pud=0000000000000000
[   77.162210] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[   77.167768] Modules linked in: bnep fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram r8188eu(C) overlay cfg80211 cdc_acm userspace_alert nvgpu ip_tables x_tables
[   77.196256] CPU: 0 PID: 1334 Comm: irq/66-7000d400 Tainted: G         C      4.9.253-tegra #1
[   77.204761] Hardware name: NVIDIA Jetson Nano Developer Kit (DT)
[   77.210753] task: ffffffc0f96b5400 task.stack: ffffffc0f7b9c000
[   77.216663] PC is at tegra_spi_start_cpu_based_transfer+0x1d0/0x220
[   77.222917] LR is at tegra_spi_start_cpu_based_transfer+0x24/0x220
[   77.229083] pc : [<ffffff800886a068>] lr : [<ffffff8008869ebc>] pstate: 204000c5
[   77.236459] sp : ffffffc0f7b9fd10
[   77.239762] x29: ffffffc0f7b9fd10 x28: 0000000000000000 
[   77.245076] x27: 0000000000000000 x26: 0000000000000000 
[   77.250387] x25: ffffff800a0fb0af x24: ffffff8008122000 
[   77.255699] x23: ffffff8008122d78 x22: 0000000000000040 
[   77.261012] x21: ffffffc0f4c52318 x20: ffffffc0f4c52318 
[   77.266325] x19: ffffffc0f7b12de8 x18: 0000000000000000 
[   77.271638] x17: 0000000000000000 x16: 0000000000000000 
[   77.276950] x15: 000000000000006a x14: 0000000000080791 
[   77.282261] x13: 0000000000000043 x12: 071c71c71c71c71c 
[   77.287575] x11: 00000000000009df x10: 0000000000000000 
[   77.292887] x9 : 0000000000000001 x8 : 0000000000000001 
[   77.298199] x7 : 0000000000000000 x6 : 0000000000000000 
[   77.303511] x5 : 0000000000000000 x4 : 0000000000000001 
[   77.308823] x3 : 0000000000000000 x2 : 0000000000000001 
[   77.314135] x1 : 0000000000000000 x0 : 0000000000000000 

[   77.320933] Process irq/66-7000d400 (pid: 1334, stack limit = 0xffffffc0f7b9c000)
[   77.328396] Call trace:
[   77.330838] [<ffffff800886a068>] tegra_spi_start_cpu_based_transfer+0x1d0/0x220
[   77.338131] [<ffffff800886a130>] handle_cpu_based_xfer+0x78/0x268
[   77.344210] [<ffffff800886ac44>] tegra_spi_isr_thread+0x3c/0x48
[   77.350118] [<ffffff8008122da8>] irq_thread_fn+0x30/0x80
[   77.355416] [<ffffff8008123134>] irq_thread+0x11c/0x1a8
[   77.360627] [<ffffff80080db0c4>] kthread+0xec/0xf0
[   77.365407] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[   77.370706] ---[ end trace aa5f35a3df4cbfeb ]---
[   77.383538] note: irq/66-7000d400[1334] exited with preempt_count 1
[   77.390034] Unable to handle kernel paging request at virtual address ffffffffffffffd8
[   77.398068] Mem abort info:
[   77.400913]   ESR = 0x96000005
[   77.404019]   Exception class = DABT (current EL), IL = 32 bits
[   77.409998]   SET = 0, FnV = 0
[   77.413108]   EA = 0, S1PTW = 0
[   77.416301] Data abort info:
[   77.419219]   ISV = 0, ISS = 0x00000005
[   77.423067]   CM = 0, WnR = 0
[   77.426108] swapper pgtable: 4k pages, 39-bit VAs, pgd = ffffff800a240000
[   77.433032] [ffffffffffffffd8] *pgd=0000000000000000, *pud=0000000000000000
[   77.440127] Internal error: Oops: 96000005 [#2] PREEMPT SMP
[   77.445686] Modules linked in: bnep fuse xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter zram r8188eu(C) overlay cfg80211 cdc_acm userspace_alert nvgpu ip_tables x_tables
[   77.474158] CPU: 0 PID: 1334 Comm: irq/66-7000d400 Tainted: G      D  C      4.9.253-tegra #1
[   77.482662] Hardware name: NVIDIA Jetson Nano Developer Kit (DT)
[   77.488653] task: ffffffc0f96b5400 task.stack: ffffffc0f7b9c000
[   77.494563] PC is at kthread_data+0x24/0x30
[   77.498736] LR is at irq_thread_dtor+0x2c/0xd8
[   77.503167] pc : [<ffffff80080dbcbc>] lr : [<ffffff8008122f6c>] pstate: 60400045
[   77.510544] sp : ffffffc0f7b9f960
[   77.513847] x29: ffffffc0f7b9f960 x28: ffffffc0f96b5400 
[   77.519158] x27: 0000000000000000 x26: 0000000000000000 
[   77.524469] x25: ffffff800a0fb0af x24: ffffff8009ec5000 
[   77.529780] x23: 00000000000001c0 x22: ffffff800a1aa0a0 
[   77.535092] x21: 0000000000000000 x20: ffffffc0f96b5400 
[   77.540402] x19: ffffffc0f96b5400 x18: 0000000000000010 
[   77.545712] x17: 0000000000000000 x16: ffffffc0f7b9fe10 
[   77.551022] x15: ffffffffffffffff x14: ffffff808a1ae5b7 
[   77.556333] x13: ffffff800a1ae5c5 x12: 0000000000000000 
[   77.561646] x11: 0000000005f5e0ff x10: 00000000000003a2 
[   77.566955] x9 : 00000000ffffffd0 x8 : ffffffc0f96b5898 
[   77.572265] x7 : ffffffc0f96b58a8 x6 : ffffffc0fefbe5e0 
[   77.577575] x5 : 000000000000000f x4 : ffffffc0f96b5c24 
[   77.582887] x3 : ffffffc0f7b9fe10 x2 : 0000000000000000 
[   77.588197] x1 : ffffff8008122f40 x0 : 0000000000000000 

[   77.594993] Process irq/66-7000d400 (pid: 1334, stack limit = 0xffffffc0f7b9c000)
[   77.602457] Call trace:
[   77.604896] [<ffffff80080dbcbc>] kthread_data+0x24/0x30
[   77.610108] [<ffffff80080d8ddc>] task_work_run+0xbc/0xd8
[   77.615407] [<ffffff80080b8438>] do_exit+0x2e0/0xa88
[   77.620359] [<ffffff800808c1a4>] die+0x194/0x198
[   77.624964] [<ffffff80080a207c>] __do_kernel_fault+0x144/0x218
[   77.630781] [<ffffff80080a2288>] do_page_fault+0x60/0x480
[   77.636165] [<ffffff80080a2714>] do_translation_fault+0x6c/0x80
[   77.642070] [<ffffff8008080954>] do_mem_abort+0x54/0xb0
[   77.647281] [<ffffff8008082904>] el1_da+0x24/0xbc
[   77.651974] [<ffffff800886a130>] handle_cpu_based_xfer+0x78/0x268
[   77.658052] [<ffffff800886ac44>] tegra_spi_isr_thread+0x3c/0x48
[   77.663955] [<ffffff8008122da8>] irq_thread_fn+0x30/0x80
[   77.669252] [<ffffff8008123134>] irq_thread+0x11c/0x1a8
[   77.674462] [<ffffff80080db0c4>] kthread+0xec/0xf0
[   77.679240] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[   77.684540] ---[ end trace aa5f35a3df4cbfec ]---
[   77.696575] Fixing recursive fault but reboot is needed!
[   87.162711] spi-tegra114 7000d400.spi: spi transfer timeout
[   87.168349] spi-tegra114 7000d400.spi: SPI_ERR: CMD_0: 0x47e00807, FIFO_STS: 0x00400005
[   87.176398] spi-tegra114 7000d400.spi: SPI_ERR: DMA_CTL: 0x00000000, TRANS_STS: 0x40ff0001
[   87.184743] spi_master spi0: failed to transfer one message from queue
[   87.191286] ads7846 spi0.1: spi_sync --> -5

What could this be? I found 2 similar topics, the first one was solved by buying a new touchscreen and the other one had a solution where they removed fwupd. The latter did not solve my issue and buying a new touchscreen is not really an option either.

The fact that it is a NULL dereference in the kernel says most likely it is a software bug in the driver. The part of the stack trace most likely to pinpoint the starting of the issue:

This is NVIDIA’s SPI driver, and although other drivers can in theory interfere, I don’t see anything related to fwupd (it could be related indirectly, but there is no evidence here that it is doing so). My guess is that somehow NVIDIA will need to reproduce this to find the exact location of the NULL dereference, but that may not be easy unless they have the same touchscreen. Perhaps another touchscreen which uses the same SPI driver could do the job.

Hm, thank you for your input. Switching touchscreens really isn’t an option for me, I’ve got to somehow make it work. Do you think it’s possible to read the spi data lines manually from a python scripts, using something like spidev? I only need the touchscreen to work when my program is running.
Edit: Maybe this is useful as well, this is how I configured the touchscreen in tegra.dts

spi@1 {
			compatible = "ti,ads7846";
			reg = <0x0>;
			spi-max-frequency = <0x7a120>;
			nvidia,enable-hw-based-cs;
			nvidia,rx-clk-tap-delay = <0x7>;
			interrupt-parent = <0x5b>;
			interrupts = <0xd 0x1>;
			pendown-gpio = <0x5b 0xd 0x0>;
			vcc-supply = <0x4c>;
			ti,x-min = [00 00];
			ti,x-max = [1f 40];
			ti,y-min = [00 00];
			ti,y-max = [12 c0];
			ti,x-plate-ohms = [00 28];
			ti,pressure-max = [00 ff];
			wakeup-source;

			controller-data {
				nvidia,cs-setup-clk-count = <0x1e>;
				nvidia,cs-hold-clk-count = <0x1e>;
				nvidia,rx-clk-tap-delay = <0x1f>;
				nvidia,tx-clk-tap-delay = <0x0>;
			};
		};

I have no ability to answer specific SPI questions, and although you can probably read SPI with another application to demonstrate it works, this wouldn’t tell you (or most likely would not tell) what the actual fault is in the tegra_spi_isr function. One really needs to know which line failed.

I understand, the only thing that is odd to me is that when I used an older version of L4T and Jetpack it was working without a problem. My linux knowledge is pretty limited so let me know if I’m thinking too “simple”. Do you think it might be possible to take the SPI drivers from an older jetpack/L4T version and put them into my current Jetpack image?

Among the R32.x kernels it is unlikely one would work when another does not. Possibly it is related to some quirk of timing or call arguments. If you tried a driver from a comletely different release, then this is likely to completely fail.

Is this touchscreen configured by device tree? If it is USB, then it is probably automatic detect. If it uses some non-USB connection, then something might have changed with respect to the device tree. If NVIDIA is unable to reproduce this, then it might be necessary to add printk() statements in the driver to find the exact line of failure and what value failed.

Hello, thanks for your response. The touchscreen is indeed configured by device tree, it is not USB. I’ve had to configure the device tree manually because the touchscreen uses an SPI protocol for the touch. Could you explain more about the print statements, is this something I can do myself? You’re talking about placing those in the driver, are you talking about this driver [<ffffff800886ac44>] tegra_spi_isr_thread+0x3c/0x48? Also if you say that among the R32.x kernels it is unlikely one would work and others do not then I might try to make a new image with the same image and kernel version and see if that one works.

This might end up as a case of a change being needed in the device tree, but perhaps not (it works a couple of times, so it can’t be completely wrong). If there is an error in SPI, then it’ll need a bug fix and using a different kernel (which isn’t really practical) is unlikely to fix anything (a newer release tends to fix bugs, but if nobody has reported this, then it won’t be fixed…and you said this is from an update).

It looks like the kernel config producing the function is “CONFIG_SPI_TEGRA114” (the module is “spi-tegra114.ko”).

Before I suggest printk statements, I see there is a “SPI_DEBUG” (I don’t know if this is allowed as a module, a config editor would know). This might add information.

If you were to modify “drivers/spi/tegra-114.c”, then you’d basically just add printk statements at various locations which name the line number (locations within function “tegra_spi_isr”). During a failure one line number would print, followed by one which does not; the error is between the two printk’s. If you know the exact line you could even change the printk just before the error to print any arguments about to pass to the failing call.

Hello,

I’ve spend all day today working on this problem again, trying various things starting with some clean installs but still no success. I’ve looked at your suggestions but I have no idea how to enable the SPI_DEBUG. I read something about it in /usr/src/linux-headers-4.9.253-tegra-ubuntu18.04_aarch64/kernel-4.9/drivers/spi/Kconfig, but that just implied its existence to me. I also found the tegra-114.c mine is called spi-tegra114.c but I think it’s the one you’re talking about. However there is only a small section about tegra_spi_isr which I have put below.

line 1723-1750 from spi-tegra114.c

static irqreturn_t tegra_spi_isr_thread(int irq, void *context_data)
{
	struct tegra_spi_data *tspi = context_data;

	if (!tspi->is_curr_dma_xfer)
		return handle_cpu_based_xfer(tspi);
	return handle_dma_based_xfer(tspi);
}

static irqreturn_t tegra_spi_isr(int irq, void *context_data)
{
	struct tegra_spi_data *tspi = context_data;

	if (tspi->polling_mode)
		dev_warn(tspi->dev, "interrupt raised in polling mode\n");

	tspi->status_reg = tegra_spi_readl(tspi, SPI_FIFO_STATUS);
	if (tspi->cur_direction & DATA_DIR_TX)
		tspi->tx_status = tspi->status_reg &
					(SPI_TX_FIFO_UNF | SPI_TX_FIFO_OVF);

	if (tspi->cur_direction & DATA_DIR_RX)
		tspi->rx_status = tspi->status_reg &
					(SPI_RX_FIFO_OVF | SPI_RX_FIFO_UNF);
	tegra_spi_clear_status(tspi);

	return IRQ_WAKE_THREAD;
}

Would it help if I added the printk statements in here?

Just so you know, any time you see “headers” in the compile instructions it thinks you are compiling out of tree. It is expecting the headers to be configured to your current system, and is building against that. You have no need to do so, you have the full source and you’re not building something missing from the main source.

Just to make life easier for you, consider using the tool “nconfig” for kernel configuration modifications. You’d still start with something like “make O=something tegra21_defconfig”, but when you go to alter that base configuration, normally you would be told to make the adjustment via “make O=something menuconfig”. Instead: “make O=something nconfig”. The two tools look almost the same, and except for “nconfig” having a “symbol search”, they are. You could search for things like “CONFIG_LOCALVERSION” (or just “LOCALVERSION”), or “CONFIG_SPI_DEBUG” (or just “SPI_DEBUG”). This would tell you where to go to find those. From that location you’d be offered a chance to change that item’s current configuration. I don’t know if SPI_DEBUG can be built as a module, but if it could, then the “m” key would select it as a module (otherwise you could only say “y” or “n”). These tools have a very important property: They know when there are dependencies on other features, and keep those features updated to match the one you are changing (if you were to directly edit a feature in the “.config” file, then something might break when you change a feature that requires something else, and fail to change the “something else”).

In all cases, if you can, stick to just module changes once the kernel has its initial configuration.

FYI, as a recap, the stack frame shows this:

[   77.320933] Process irq/66-7000d400 (pid: 1334, stack limit = 0xffffffc0f7b9c000)
[   77.328396] Call trace:
[   77.330838] [<ffffff800886a068>] tegra_spi_start_cpu_based_transfer+0x1d0/0x220
[   77.338131] [<ffffff800886a130>] handle_cpu_based_xfer+0x78/0x268
[   77.344210] [<ffffff800886ac44>] tegra_spi_isr_thread+0x3c/0x48
[   77.350118] [<ffffff8008122da8>] irq_thread_fn+0x30/0x80
[   77.355416] [<ffffff8008123134>] irq_thread+0x11c/0x1a8
[   77.360627] [<ffffff80080db0c4>] kthread+0xec/0xf0
[   77.365407] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[   77.370706] ---[ end trace aa5f35a3df4cbfeb ]---

Within that the function where something goes wrong and needs a detailed track from is:

[   77.330838] [<ffffff800886a068>] tegra_spi_start_cpu_based_transfer+0x1d0/0x220

However, I am guessing that whatever was passed from this earlier function might be the real culprit:

[   77.344210] [<ffffff800886ac44>] tegra_spi_isr_thread+0x3c/0x48

What this means is that if you add printk statements (which print to the log) at strategic points in those modules, then you’ll know more than “somewhere in that function”. Here’s a good article on using printk’s like this:
https://elinux.org/Debugging_by_printing

Also:
https://www.kernel.org/doc/html/latest/core-api/printk-basics.html

Note that this function can start with a specification of level of interest, e.g., “KERN_ALERT”. All this does is allow the logging tool to filter by priority. If logging is set to some level, and the specification is lower priority, then it won’t print. There is a default, plus you can specify a priority which always gets printed (it is possible to tell logging to log lower priority messages and it’ll add to logs). The second URL I gave tells you how to examine current log levels if interested.

Note that a log line like this has all the basics you’d need, but placed in multiple locations of the above two functions (taken from the elinux.org article, but I customized it a bit):

printk(KERN_ALERT "SPI DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);

As an example, you could then do this to find just those messages:
dmesg | grep 'SPI DEBUG'

Once you have the error, you could view this, find the last line which succeeded in both of those two functions, and consider that what the next line was is likely the failure point. Depending on those failure points, it might also be possible to show the arguments about to be passed. It just narrows down on the point in the stack frame where the error is (“smoking gun” evidence), and the argument can tell you a bit about why it failed when it otherwise works most of the time. This can be used to create a bug fix.

If the functions are part of a module, then you simply need to initially preserve a copy of the unmodified module, and then copy the new module file in place. Note that you might need to first rmmod the module (which might in turn require an rmmod of other modules), followed by an insmod.

If this is a change in the base kernel, and not just in a module, then it becomes a more delicate situation because you basically would want a new module directory with all of the modules in it built against that kernel. Stick to modules if you can.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.