Possible bug in CryptoAPI due to tegra_xxx functions

david.fernandez · August 15, 2022, 4:05pm

Running MPPE over a PPP link (using MS-CHAP non-V2) runs into a kernel bug panic, probably because some tegra_xxx functions are not interrupt safe as the kernel expects:

Running BSP 32.5.1 (kernel 4.9.201-tegra) on a Jetson AGX Xavier devkit

Could any nVidia developer check that and confirm if there is a simple patch to avoid this problem?

Regards

This is the bug:

[  114.224871] kernel BUG at ../mm/vmalloc.c:1390!
[  114.224997] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
...
[  114.226190] Hardware name: Jetson-AGX (DT)
[  114.226273] task: ffffffc795404600 task.stack: ffffffc7cff78000
[  114.226400] PC is at __get_vm_area_node.isra.10+0x178/0x190
[  114.226509] LR is at get_vm_area_caller+0x54/0x68
[  114.226600] pc : [<ffffff8008212a58>] lr : [<ffffff8008212cbc>] pstate: 00400145
[  114.227031] sp : ffffffc7cff7b590
[  114.227306] x29: ffffffc7cff7b590 x28: ffffffc7da0fb018 
[  114.229984] x27: 0000000000000007 x26: ffffffbebfff0000 
[  114.235568] x25: ffffff8008000000 x24: 0000000000000001 
[  114.240820] x23: ffffff8008bffd9c x22: 0000000000000008 
[  114.245891] x21: 00000000024000c0 x20: 0000000000000008 
[  114.250965] x19: 0000000000001000 x18: 0000007fcb8092a4 
[  114.256739] x17: 0000007f7a5de960 x16: ffffff800825d3f8 
[  114.262201] x15: 000000000079baf5 x14: 0000000000000028 
[  114.268114] x13: ffffffc7cd692e80 x12: 0000000000000001 
[  114.273473] x11: ffffff8009531000 x10: ffffffbf00000000 
[  114.279494] x9 : 00000000fff7f000 x8 : ffffffc7d1c46000 
[  114.285264] x7 : 0000000000851c46 x6 : ffffff8008bffd9c 
[  114.290776] x5 : 00000000024000c0 x4 : ffffffbebfff0000 
[  114.296113] x3 : ffffff8008000000 x2 : 0000000000000008 
[  114.301454] x1 : 0000000000000001 x0 : 0000000000000401 
[  114.306543] 
[  114.308199] Process pppd (pid: 8163, stack limit = 0xffffffc7cff78000)
[  114.314061] Call trace:
[  114.316699] [<ffffff8008212a58>] __get_vm_area_node.isra.10+0x178/0x190
[  114.322815] [<ffffff8008212cbc>] get_vm_area_caller+0x54/0x68
[  114.328160] [<ffffff800879e4f8>] dma_common_pages_remap+0x40/0x90
[  114.334019] [<ffffff80080a10b0>] __iommu_alloc_attrs+0xd8/0x478
[  114.339879] [<ffffff8008bffd9c>] tegra_se_sha_process_buf+0x5ec/0x848
[  114.345821] [<ffffff8008c00104>] tegra_se_sha_op+0x10c/0x1e0
[  114.350903] [<ffffff8008c00234>] tegra_se_sha_digest+0x5c/0x98
[  114.356515] [<ffffff8008404a08>] crypto_ahash_op+0x40/0xa8
[  114.361579] [<ffffff8008404b10>] crypto_ahash_digest+0x30/0x48
[  114.367180] [<ffffff80089eb0cc>] get_new_key_from_sha+0x11c/0x148
[  114.373384] [<ffffff80089eb154>] mppe_rekey+0x5c/0x190
[  114.378634] [<ffffff80089eba58>] mppe_init.part.1+0xd8/0x228
[  114.383977] [<ffffff80089ebcc0>] mppe_comp_init+0x88/0x90
[  114.389318] [<ffffff80089e3b00>] ppp_ccp_peek+0x188/0x240
[  114.394828] [<ffffff80089e4460>] __ppp_xmit_process+0xe8/0x550
[  114.400947] [<ffffff80089e4d48>] ppp_xmit_process+0x50/0xb8
[  114.406634] [<ffffff80089e65fc>] ppp_write+0x11c/0x158
[  114.411798] [<ffffff800825add8>] __vfs_write+0x48/0x118
[  114.416703] [<ffffff800825bdcc>] vfs_write+0xac/0x1b0
[  114.422035] [<ffffff800825d454>] SyS_write+0x5c/0xc8
[  114.427024] [<ffffff8008083900>] el0_svc_naked+0x34/0x38
[  114.432292] ---[ end trace 5d5f03683597ef6b ]---
[  114.453003] Kernel panic - not syncing: Fatal exception in interrupt
[  114.453146] SMP: stopping secondary CPUs
[  114.453239] Kernel Offset: disabled
[  114.453313] Memory Limit: none
[  114.454055] trusty-log panic notifier - trusty version Built: 08:40:57 Feb 19 2021
[  114.477398] Rebooting in 5 seconds..

JerryChang · August 16, 2022, 2:42am

hello david.fernandez,

I don’t have experience with MPPE. however, l4t-r32.5.1 is a quite old l4t release version, could you please try moving to latest r32 release, i.e. L4T R32.7.2 for confirmation.
thanks

david.fernandez · August 16, 2022, 11:30am

Right, MPPE is just a link encryption (works like a compression protocol CCP) for PPP, and PPP works like a line discipline for a TTY device, be that a modem, a serial device, or anything of the like.

If you look at the stack trace (at the end of the panic dump), the problem happens because the trace originates off ppp_write.

From the Documentation/serial/tty.txt:

write() - Called to write bytes to the device. May not
sleep. May occur in parallel in special cases.
Because this includes panic paths drivers generally
shouldn’t try and do clever locking here.

Shows that the TTY write path should not sleep, mainly because it acquires irq level locks.

And from include/linux/tty_ldisc.h:

ssize_t (*write)(struct tty_struct * tty, struct file * file,
       const unsigned char * buf, size_t nr);
This function is called when the user requests to write to the

tty. The line discipline will deliver the characters to the

low-level tty device for transmission, optionally performing

some processing on the characters first. If this function is

not defined, the user will receive an EIO error.

Just to confirm that the ppp_write will be in the middle of the path to the TTY write.

What happens here is that, after the MS-CHAP authentication, MPPE is ready to get its encryption key derived from the credentials and initialize its cypher.

So it does that as part of transferring its first data frame on the link.

BUT… the crypto_ahash_digest called to calculate a SHA-1 hash, goes through some tegra_se_sha_digest call, which I take is an optimization from nVidia to do some cryptographic operations taking advantage of Jetson hardware, but those functions end up trying to allocate memory in a way that sleeps, and the BUG check in vmalloc tests to be sure that it is not being called from interrupt context.

Regarding our version of L4T, unfortunately, those Jetsons are in a satellite and we have no way to reflash them… tried all possible ways to run the flash from one Jetson to another, but seems that some of the flash utility binaries are just intel 32-bit executables with no ARM versions for them, and qemu is not in good shape when running an intel guest in an ARM host, so I am a bit stuck with that version for a while…

I’ll see if I can try the latest version you mentioned in my dev-kit to check if there is a fix for that, but I wonder if, with the information I have provided, you could check if the same function path will still try to allocate memory in the same way anyway, and if there could be a way to patch that easily… could try a patch on the running kernel and see.

Regards
David

JerryChang · August 17, 2022, 6:58am

hello david.fernandez,

could you please have modification to the tegra_se,
i.e. $TOP/public_sources/Linux_for_Tegra/source/public/kernel/nvidia/drivers/crypto/tegra-se-nvhost.c
please have a try to chang dma_alloc_attrs/dma_free_attrs to dma_alloc_coherent/dma_free_coherent
for example,

--- a/drivers/crypto/tegra-se-nvhost.c
+++ b/drivers/crypto/tegra-se-nvhost.c
@@ -1254,7 +1254,7 @@ static int tegra_se_send_sha_data(struct tegra_se_dev *se_dev,
        unsigned int total = count, val;
        u64 msg_len;

-       cmdbuf_cpuvaddr = dma_alloc_attrs(se_dev->dev->parent, SZ_4K,
+       cmdbuf_cpuvaddr = dma_alloc_coherent(se_dev->dev->parent, SZ_4K,
                                          &cmdbuf_iova, GFP_KERNEL,
                                          __DMA_ATTR(attrs));
        if (!cmdbuf_cpuvaddr) {
@@ -1264,7 +1264,7 @@ static int tegra_se_send_sha_data(struct tegra_se_dev *se_dev,

        while (total) {
                if (src_ll->data_len & SE_BUFF_SIZE_MASK) {
-                       dma_free_attrs(se_dev->dev->parent, SZ_4K,
+                       dma_free_coherent(se_dev->dev->parent, SZ_4K,
                                       cmdbuf_cpuvaddr, cmdbuf_iova,
                                       __DMA_ATTR(attrs));
                        return -EINVAL;
@@ -1347,7 +1347,7 @@ static int tegra_se_send_sha_data(struct tegra_se_dev *se_dev,
        err = tegra_se_channel_submit_gather(se_dev, cmdbuf_cpuvaddr,
                                             cmdbuf_iova, 0, cmdbuf_num_words,
                                             SHA_CB);
-       dma_free_attrs(se_dev->dev->parent, SZ_4K, cmdbuf_cpuvaddr,
+       dma_free_coherent(se_dev->dev->parent, SZ_4K, cmdbuf_cpuvaddr,
                       cmdbuf_iova, __DMA_ATTR(attrs));

if above doesn’t works,
please lower the priority, which bypasses the use of hardware implementation for SHA1.
for example,

static struct ahash_alg hash_algs[] = {
        {
                        ...
                        .cra_name = "sha1",
                        .cra_driver_name = "tegra-se-sha1",
                        .cra_priority = 300,

david.fernandez · August 18, 2022, 2:17pm

Thanks Jerry,

Tried the first method, but unfortunately, all patching facilities in the kernel were disabled by default, which we did not realized.

The second worked !!!
Using addresses from System.map (no kallsyms info by default), once we realized that there were two structure arrays to patch… not sure if one is some sort of backup, but both had the same driver name and all that.

At least that will keep us going until we can prepare some way of flashing the Jetsons again.

Cheers
David

system · September 7, 2022, 5:04am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
High CPU Usage with Intel 8265 on Jetson Xavier AGX JetPack 4.4.1 Jetson AGX Xavier wifi , bluetooth	23	1428	September 24, 2021
How to access/manipulate Always On (AON) GPIOs from the SPE/AON processor? Jetson Xavier NX gpio , spe	26	1541	February 28, 2023
AGX Xavier kept rebooting after crash Jetson AGX Xavier boot	11	1975	October 18, 2021
Jetson AGX Orin PCIe C7 "Phy link never came up" Jetson AGX Orin pcie , board-design	12	608	April 17, 2024
Configuring Jetson AGX Xavier 40-pin expansion header for SPI communications Jetson AGX Xavier spi	13	120	January 29, 2025
Unable to Flash NVME Jetson AGX Orin 64GB Dev kit with supplied USB Cable Jetson AGX Orin reflash , nvme	44	172	April 22, 2025
Jetson Xavier NX DEVKIT secureboot enabled Jetson Xavier NX security , nvbugs	21	5927	October 18, 2021
Kernel panic when memory mapping, reading, and writing peripheral device registers Jetson AGX Xavier audio	8	466	June 28, 2023
CANBUS not working Jetson AGX Xavier can-bus	10	137	September 2, 2024
PWM in Tegra Jetson TK1 not giving any pulse. Jetson TK1	50	10330	March 21, 2017

Possible bug in CryptoAPI due to tegra_xxx functions

Related topics