R32.7.1 / 4.9.253-rt168 : BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:987

Hello,

I am seeing the following BUG message on boot when running a preemptable kernel with LOCKDEP & etc enabled.
Could you please let me know if this is a real issue and whether there is a fix for it?
Thank you

My configuration:

  • Jetson AGX Xavier
  • JetPack 4.6.1
  • custom built r32.7.1 with rt-patches applied (4.9.253-rt168)
  • CONFIG_PREEMPT_RT_FULL=y
  • CONFIG_DEBUG_KERNEL=y
  • CONFIG_LOCKDEP=y
  • CONFIG_PROVE_LOCKING=y
  • CONFIG_DEBUG_SPINLOCK=y
[    3.597425] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:987
[    3.597431] in_atomic(): 1, irqs_disabled(): 0, pid: 106, name: kworker/u16:3
[    3.597436] 6 locks held by kworker/u16:3/106:
[    3.597463]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080dcef0>] process_one_work+0x1c0/0x7b0
[    3.597481]  #1:  ((&entry->work)){+.+.+.}, at: [<ffffff80080dcef0>] process_one_work+0x1c0/0x7b0
[    3.597501]  #2:  (&dev->mutex){......}, at: [<ffffff8008889aa4>] __driver_attach+0x5c/0x128
[    3.597518]  #3:  (&dev->mutex){......}, at: [<ffffff8008889ab0>] __driver_attach+0x68/0x128
[    3.597539]  #4:  (tegra_dc_registration_lock){+.+.+.}, at: [<ffffff80086695c8>] tegra_dc_probe+0x58/0x1aa0
[    3.597558]  #5:  (&i2c_dev->xfer_lock){+.+...}, at: [<ffffff8008c11048>] tegra_i2c_xfer_msg+0x1d8/0xc38
[    3.597567] Preemption disabled at:
[    3.597568] [<ffffff8008c11048>] tegra_i2c_xfer_msg+0x1d8/0xc38
[    3.597579] CPU: 5 PID: 106 Comm: kworker/u16:3 Not tainted 4.9.253-rt168 #2
[    3.597582] Hardware name: Jetson-AGX (DT)
[    3.597602] Workqueue: events_unbound async_run_entry_fn
[    3.597606] Call trace:
[    3.597615] [<ffffff800808c2b8>] dump_backtrace+0x0/0x1a8
[    3.597622] [<ffffff800808c964>] show_stack+0x24/0x30
[    3.597631] [<ffffff80090ea3c0>] dump_stack+0xa0/0xd0
[    3.597637] [<ffffff80080f0794>] ___might_sleep+0x15c/0x230
[    3.597644] [<ffffff80090f6610>] rt_spin_lock+0x30/0x70
[    3.597652] [<ffffff8008210438>] get_page_from_freelist+0x180/0xd40
[    3.597658] [<ffffff80082118c4>] __alloc_pages_nodemask+0x124/0xef8
[    3.597666] [<ffffff8008276f88>] allocate_slab+0x368/0x570
[    3.597671] [<ffffff80082771d8>] new_slab+0x48/0x88
[    3.597678] [<ffffff8008279074>] ___slab_alloc.constprop.33+0x334/0x560
[    3.597684] [<ffffff8008279308>] __slab_alloc.isra.29.constprop.32+0x68/0xc0
[    3.597690] [<ffffff800827a4f8>] __kmalloc_track_caller+0x3f8/0x4a8
[    3.597697] [<ffffff800888e650>] devm_kmalloc+0x38/0x88
[    3.597706] [<ffffff80087b13dc>] tegra_dma_desc_alloc+0x3c/0xa8
[    3.597712] [<ffffff80087b14c0>] tegra_dma_desc_get+0x78/0xd8
[    3.597721] [<ffffff80087b1e74>] tegra_dma_prep_slave_sg+0x10c/0x498
[    3.597727] [<ffffff8008c110a8>] tegra_i2c_xfer_msg+0x238/0xc38
[    3.597733] [<ffffff8008c1218c>] tegra_i2c_xfer+0x5e4/0x7a8
[    3.597741] [<ffffff8008c08690>] __i2c_transfer+0x128/0x9e0
[    3.597747] [<ffffff8008c08fe0>] i2c_transfer+0x98/0x108
[    3.597755] [<ffffff800870e9a0>] tegra_hdmi_ddc_i2c_xfer+0x70/0x158
[    3.597762] [<ffffff800869faf8>] tegra_edid_read_block+0xc0/0x260
[    3.597768] [<ffffff80086a023c>] tegra_edid_get_monspecs+0x74/0xbf8
[    3.597775] [<ffffff800870f240>] tegra_hdmi_controller_enable+0x2e8/0xf88
[    3.597781] [<ffffff800870ff28>] tegra_dc_hdmi_enable+0x48/0xb8
[    3.597789] [<ffffff800875df60>] tegra_nvdisp_head_enable+0x4a0/0x13b8
[    3.597795] [<ffffff8008661b50>] _tegra_dc_enable+0xf0/0x110
[    3.597802] [<ffffff800866a754>] tegra_dc_probe+0x11e4/0x1aa0
[    3.597808] [<ffffff800888c558>] platform_drv_probe+0x60/0xc0
[    3.597814] [<ffffff8008889898>] driver_probe_device+0x298/0x448
[    3.597820] [<ffffff8008889b24>] __driver_attach+0xdc/0x128
[    3.597828] [<ffffff80088870c4>] bus_for_each_dev+0x5c/0xa8
[    3.597835] [<ffffff8008888ed8>] driver_attach+0x30/0x40
[    3.597840] [<ffffff8008887680>] driver_attach_async+0x20/0x60
[    3.597847] [<ffffff80080e9510>] async_run_entry_fn+0x48/0x150
[    3.597853] [<ffffff80080dcf98>] process_one_work+0x268/0x7b0
[    3.597859] [<ffffff80080dd52c>] worker_thread+0x4c/0x518
[    3.597865] [<ffffff80080e4e6c>] kthread+0xf4/0xf8
[    3.597871] [<ffffff80080830c0>] ret_from_fork+0x10/0x50

@Bibek
Could you help to check this topic.

Thanks

@SiarheiLiakh
Please apply attached patch to verify the problem.

Thanks
9b5e1a1.diff.zip (2.1 KB)

The patch does not resolve the issue (although, it might be addressing a different problem with rt-spinlocks).

This specific problem is not with the locks in i2c-tegra-vi.c, but with tegra_i2c_xfer_msg() disabling preemption and then proceeding to call other functions which allocate memory [ tegra_dma_desc_alloc() ] and, therefore, may sleep [ rt_spin_lock() in get_page_from_freelist() ].

Thus, a proper solution would be to refactor the code such that all memory allocations happen outside of non-preemptable region.

Thank you.

@SiarheiLiakh
Please apply blow patch for this issue.

commit 10a9b92122d64c9aa5d9b20b920cf0e47694c3e5 (HEAD)
Author: Petlozu Pravareshwar <petlozup@nvidia.com>
Date:   Mon Aug 1 11:41:54 2022 +0000

    i2c: tegra-vi: Fix sleeping API in atomic context

    dmaengine_prep_slave_single API has calls to allocate memory which can
    sleep. Hence avoid calling this API from an atomic context and fix below
    issue.

    BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:987
    in_atomic(): 1, irqs_disabled(): 0, pid: 106, name: kworker/u16:3
    6 locks held by kworker/u16:3/106:
     #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080dcef0>] process_one_work+0x1c0/0x7b0
     #1:  ((&entry->work)){+.+.+.}, at: [<ffffff80080dcef0>] process_one_work+0x1c0/0x7b0
     #2:  (&dev->mutex){......}, at: [<ffffff8008889aa4>] __driver_attach+0x5c/0x128
     #3:  (&dev->mutex){......}, at: [<ffffff8008889ab0>] __driver_attach+0x68/0x128
     #4:  (tegra_dc_registration_lock){+.+.+.}, at: [<ffffff80086695c8>] tegra_dc_probe+0x58/0x1aa0
     #5:  (&i2c_dev->xfer_lock){+.+...}, at: [<ffffff8008c11048>] tegra_i2c_xfer_msg+0x1d8/0xc38
    Preemption disabled at:
    [<ffffff8008c11048>] tegra_i2c_xfer_msg+0x1d8/0xc38

    Bug 3734457

    Change-Id: I34bcab14c9ca83e030368897846b92876ebcf7c9
    Signed-off-by: Petlozu Pravareshwar <petlozup@nvidia.com>

diff --git a/drivers/i2c/busses/i2c-tegra-vi.c b/drivers/i2c/busses/i2c-tegra-vi.c
index ed7ab7f93..4f22cee99 100644
--- a/drivers/i2c/busses/i2c-tegra-vi.c
+++ b/drivers/i2c/busses/i2c-tegra-vi.c
@@ -7,7 +7,7 @@
  * Copyright (C) 2015 Google, Inc.
  * Author: Tomasz Figa <tfiga@chromium.org>
  *
- * Copyright (C) 2010-2019 NVIDIA Corporation. All rights reserved.
+ * Copyright (C) 2010-2022 NVIDIA Corporation. All rights reserved.
  *
  * This software is licensed under the terms of the GNU General Public
  * License version 2, as published by the Free Software Foundation, and
@@ -499,24 +499,13 @@ static void tegra_i2c_tx_dma_complete(void *args)
        complete(&i2c_dev->tx_dma_complete);
 }

-static int tegra_i2c_start_tx_dma(struct tegra_i2c_dev *i2c_dev, int len)
+static void tegra_i2c_start_tx_dma(struct tegra_i2c_dev *i2c_dev, int len)
 {
-       reinit_completion(&i2c_dev->tx_dma_complete);
-       dev_dbg(i2c_dev->dev, "Starting tx dma for len:%d\n", len);
-       i2c_dev->tx_dma_desc = dmaengine_prep_slave_single(i2c_dev->tx_dma_chan,
-                               i2c_dev->tx_dma_phys, len, DMA_MEM_TO_DEV,
-                               DMA_PREP_INTERRUPT |  DMA_CTRL_ACK);
-       if (!i2c_dev->tx_dma_desc) {
-               dev_err(i2c_dev->dev, "Not able to get desc for Tx\n");
-               return -EIO;
-       }
-
        i2c_dev->tx_dma_desc->callback = tegra_i2c_tx_dma_complete;
        i2c_dev->tx_dma_desc->callback_param = i2c_dev;

        dmaengine_submit(i2c_dev->tx_dma_desc);
        dma_async_issue_pending(i2c_dev->tx_dma_chan);
-       return 0;
 }

 static int tegra_i2c_start_rx_dma(struct tegra_i2c_dev *i2c_dev, int len)
@@ -1382,15 +1371,21 @@ static int tegra_i2c_start_dma_based_xfer(struct tegra_i2c_dev *i2c_dev,

                tegra_i2c_config_fifo_trig(i2c_dev, dma_xfer_len);

+               reinit_completion(&i2c_dev->tx_dma_complete);
+               dev_dbg(i2c_dev->dev, "Starting tx dma for len:%d\n", len);
+               i2c_dev->tx_dma_desc = dmaengine_prep_slave_single(
+                               i2c_dev->tx_dma_chan, i2c_dev->tx_dma_phys,
+                               len, DMA_MEM_TO_DEV,
+                               DMA_PREP_INTERRUPT |  DMA_CTRL_ACK);
+               if (!i2c_dev->tx_dma_desc) {
+                       dev_err(i2c_dev->dev, "Not able to get desc for Tx\n");
+                       return -EIO;
+               }
+
                /* Acquire the lock before posting the data to FIFO */
                spin_lock_irqsave(&i2c_dev->xfer_lock, flags);

-               ret = tegra_i2c_start_tx_dma(i2c_dev, dma_xfer_len);
-               if (ret < 0) {
-                       dev_err(i2c_dev->dev,
-                               "Starting tx dma failed, err %d\n", ret);
-                       goto exit;
-               }
+               tegra_i2c_start_tx_dma(i2c_dev, dma_xfer_len);
        } else {
                i2c_dev->curr_direction = DATA_DMA_DIR_RX;
                /* Round up final length for DMA xfer */
@@ -1421,7 +1416,6 @@ static int tegra_i2c_start_dma_based_xfer(struct tegra_i2c_dev *i2c_dev,

        tegra_i2c_unmask_irq(i2c_dev, int_mask);

-exit:
        spin_unlock_irqrestore(&i2c_dev->xfer_lock, flags);
        return ret;
 }

I have two issues with the patch posted above:

  1. The patch was posted as quoted text which converted all tabs to spaces making it impossible to apply as-is
  2. Once the spacing issues fixed by hand, the patch applies cleanly but the code does not compile due to undeclared variable (see error message below). Note that I am using JetPack 4.6.1 custom built r32.7.1 with rt-patches applied (4.9.253-rt168).

Thank you.

  CC      drivers/i2c/busses/i2c-tegra.o
  CC      drivers/i2c/busses/i2c-tegra-vi.o
  CC [M]  drivers/gpu/nvgpu/os/linux/module.o
  CC      drivers/hid/hid-prodikeys.o
In file included from ./include/linux/printk.h:317:0,
                 from ./include/linux/kernel.h:13,
                 from /ssd/xavier/release/4.9-jetson/a/nvidia/drivers/i2c/busses/i2c-tegra-vi.c:23:
/ssd/xavier/release/4.9-jetson/a/nvidia/drivers/i2c/busses/i2c-tegra-vi.c: In function 'tegra_i2c_start_dma_based_xfer':
/ssd/xavier/release/4.9-jetson/a/nvidia/drivers/i2c/busses/i2c-tegra-vi.c:1375:57: error: 'len' undeclared (first use in this function)
   dev_dbg(i2c_dev->dev, "Starting tx dma for len:%d\n", len);
                                                         ^
./include/linux/dynamic_debug.h:135:9: note: in definition of macro 'dynamic_dev_dbg'
       ##__VA_ARGS__);  \
         ^~~~~~~~~~~
/ssd/xavier/release/4.9-jetson/a/nvidia/drivers/i2c/busses/i2c-tegra-vi.c:1375:3: note: in expansion of macro 'dev_dbg'
   dev_dbg(i2c_dev->dev, "Starting tx dma for len:%d\n", len);
   ^~~~~~~
/ssd/xavier/release/4.9-jetson/a/nvidia/drivers/i2c/busses/i2c-tegra-vi.c:1375:57: note: each undeclared identifier is reported only once for each function it appears in
   dev_dbg(i2c_dev->dev, "Starting tx dma for len:%d\n", len);
                                                         ^
./include/linux/dynamic_debug.h:135:9: note: in definition of macro 'dynamic_dev_dbg'
       ##__VA_ARGS__);  \
         ^~~~~~~~~~~
/ssd/xavier/release/4.9-jetson/a/nvidia/drivers/i2c/busses/i2c-tegra-vi.c:1375:3: note: in expansion of macro 'dev_dbg'
   dev_dbg(i2c_dev->dev, "Starting tx dma for len:%d\n", len);
   ^~~~~~~
scripts/Makefile.build:335: recipe for target 'drivers/i2c/busses/i2c-tegra-vi.o' failed
make[3]: *** [drivers/i2c/busses/i2c-tegra-vi.o] Error 1
make[3]: *** Waiting for unfinished jobs....
  CC      net/ipv6/netfilter.o
  CC      fs/kernfs/symlink.o
  CC      fs/jbd2/journal.o
  CC      drivers/hid/hid-pl.o
  LD      fs/kernfs/built-in.o
  CC      drivers/iio/industrialio-core.o
  CC      fs/lockd/clntlock.o
  CC      net/ipv4/af_inet.o
  CC      drivers/hid/hid-petalynx.o
scripts/Makefile.build:649: recipe for target 'drivers/i2c/busses' failed
make[2]: *** [drivers/i2c/busses] Error 2
scripts/Makefile.build:649: recipe for target 'drivers/i2c' failed
make[1]: *** [drivers/i2c] Error 2
make[1]: *** Waiting for unfinished jobs....

I’ve replaced “len” with “dma_xfer_len” where appropriate and it appears to compile fine.
However, the issue persists:

[    3.449247] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:987
[    3.449252] in_atomic(): 1, irqs_disabled(): 128, pid: 6, name: kworker/u16:0
[    3.449255] 6 locks held by kworker/u16:0/6:
[    3.449277]  #0:  ("events_unbound"){.+.+.+}, at: [<ffffff80080f9c7c>] process_one_work+0x1b4/0x730
[    3.449290]  #1:  ((&entry->work)){+.+.+.}, at: [<ffffff80080f9c7c>] process_one_work+0x1b4/0x730
[    3.449306]  #2:  (&dev->mutex){......}, at: [<ffffff8008856b7c>] __driver_attach+0x5c/0x128
[    3.449318]  #3:  (&dev->mutex){......}, at: [<ffffff8008856b88>] __driver_attach+0x68/0x128
[    3.449335]  #4:  (tegra_dc_registration_lock){+.+.+.}, at: [<ffffff800864c658>] tegra_dc_probe+0x70/0x19e8
[    3.449350]  #5:  (&i2c_dev->xfer_lock){......}, at: [<ffffff8008bd1940>] tegra_i2c_xfer_msg+0x220/0xcc0
[    3.449354] irq event stamp: 50140
[    3.449363] hardirqs last  enabled at (50139): [<ffffff800905b81c>] _raw_spin_unlock_irqrestore+0x6c/0xb0
[    3.449370] hardirqs last disabled at (50140): [<ffffff800905b5cc>] _raw_spin_lock_irqsave+0x2c/0x70
[    3.449378] softirqs last  enabled at (0): [<ffffff80080d18a4>] copy_process.isra.7.part.8+0x2ac/0x1990
[    3.449383] softirqs last disabled at (0): [<          (null)>]           (null)
[    3.449391] Preemption disabled at:
[    3.449391] [<ffffff8008bd1940>] tegra_i2c_xfer_msg+0x220/0xcc0
[    3.449406] CPU: 1 PID: 6 Comm: kworker/u16:0 Tainted: G        W       4.9.253-rt168-r32.7.1-tegra-RedHawk-7.5.8-r653-rt-sign...d-fix #1
[    3.449409] Hardware name: Jetson-AGX (DT)
[    3.449427] Workqueue: events_unbound async_run_entry_fn
[    3.449429] Call trace:
[    3.449437] [<ffffff800808c708>] dump_backtrace+0x0/0x1b0
[    3.449443] [<ffffff800808c9f4>] show_stack+0x24/0x30
[    3.449450] [<ffffff800904fe2c>] dump_stack+0xa0/0xd0
[    3.449455] [<ffffff800810d77c>] ___might_sleep+0x144/0x1a8
[    3.449461] [<ffffff800905bca0>] rt_spin_lock+0x30/0x68
[    3.449466] [<ffffff800885b474>] devres_add+0x2c/0x50
[    3.449470] [<ffffff800885b57c>] devm_kmalloc+0x5c/0x88
[    3.449478] [<ffffff800877b914>] tegra_dma_desc_alloc+0x3c/0xa8
[    3.449483] [<ffffff800877ba04>] tegra_dma_desc_get+0x84/0xc8
[    3.449489] [<ffffff800877c37c>] tegra_dma_prep_slave_sg+0x10c/0x488
[    3.449493] [<ffffff8008bd19a0>] tegra_i2c_xfer_msg+0x280/0xcc0
[    3.449498] [<ffffff8008bd2ac4>] tegra_i2c_xfer+0x5e4/0x7a8
[    3.449507] [<ffffff8008bc8e68>] __i2c_transfer+0x128/0x8d0
[    3.449512] [<ffffff8008bc96a8>] i2c_transfer+0x98/0x108
[    3.449520] [<ffffff80086e5728>] tegra_hdmi_ddc_i2c_xfer+0x70/0x158
[    3.449526] [<ffffff800867f178>] tegra_edid_read_block+0xc0/0x260
[    3.449531] [<ffffff800867f8bc>] tegra_edid_get_monspecs+0x74/0xbf8
[    3.449537] [<ffffff80086e6000>] tegra_hdmi_controller_enable+0x320/0xe08
[    3.449542] [<ffffff80086e6b30>] tegra_dc_hdmi_enable+0x48/0xb8
[    3.449549] [<ffffff800872cb30>] tegra_nvdisp_head_enable+0x4d0/0x1250
[    3.449555] [<ffffff8008645458>] _tegra_dc_enable+0xf0/0x110
[    3.449561] [<ffffff800864d7dc>] tegra_dc_probe+0x11f4/0x19e8
[    3.449566] [<ffffff8008859530>] platform_drv_probe+0x60/0xc0
[    3.449572] [<ffffff8008856970>] driver_probe_device+0x298/0x448
[    3.449578] [<ffffff8008856bfc>] __driver_attach+0xdc/0x128
[    3.449583] [<ffffff800885406c>] bus_for_each_dev+0x5c/0xa8
[    3.449588] [<ffffff8008855fb0>] driver_attach+0x30/0x40
[    3.449594] [<ffffff8008854488>] driver_attach_async+0x20/0x60
[    3.449599] [<ffffff80081052b0>] async_run_entry_fn+0x48/0x148
[    3.449603] [<ffffff80080f9d84>] process_one_work+0x2bc/0x730
[    3.449608] [<ffffff80080fa250>] worker_thread+0x58/0x4c0
[    3.449614] [<ffffff80081010f4>] kthread+0xf4/0xf8
[    3.449619] [<ffffff8008083330>] ret_from_fork+0x10/0x20

Perhaps, the patch suggested above addressed a similar issue but in i2c-tegra-vi.c, while we see it in i2c-tegra.c:

$ ./scripts/faddr2line vmlinux tegra_hdmi_ddc_i2c_xfer+0x70/0x158 i2c_transfer+0x98/0x108 __i2c_transfer+0x128/0x8d0 tegra_i2c_xfer+0x5e4/0x7a8 tegra_i2c_xfer_msg+0x280/0xcc0 tegra_dma_prep_slave_sg+0x10c/0x488

tegra_hdmi_ddc_i2c_xfer+0x70/0x158:
tegra_hdmi_ddc_i2c_xfer at nvidia/drivers/video/tegra/dc/hdmi2.0.c:205

i2c_transfer+0x98/0x108:
i2c_transfer at drivers/i2c/i2c-core.c:2657

__i2c_transfer+0x128/0x8d0:
__i2c_transfer at drivers/i2c/i2c-core.c:2584

tegra_i2c_xfer+0x5e4/0x7a8:
tegra_i2c_multi_pkt_xfer at drivers/i2c/busses/i2c-tegra.c:1968
 (inlined by) tegra_i2c_xfer at drivers/i2c/busses/i2c-tegra.c:2146

tegra_i2c_xfer_msg+0x280/0xcc0:
tegra_i2c_start_tx_dma at drivers/i2c/busses/i2c-tegra.c:511
 (inlined by) tegra_i2c_start_dma_xfer at drivers/i2c/busses/i2c-tegra.c:1480
 (inlined by) tegra_i2c_xfer_msg at drivers/i2c/busses/i2c-tegra.c:1659

tegra_dma_prep_slave_sg+0x10c/0x488:
tegra_dma_prep_slave_sg at nvidia/drivers/dma/tegra186-gpc-dma.c:1395

Please apply the attached patch for i2c-tegra.c

0001-i2c-tegra-remove-sleeping-apis-in-atomic-context.patch (5.2 KB)