455.23.04: Page allocation failure in kernel module at random points

My issue looks like the nvidia-modeset/: page allocation failure (from syslog).

  • Linux Mint 20 Cinnamon (4.6.7)
  • Kernel: 5.4.0-51-generic
  • Video: NVIDIA Corporation GK107 [GeForce GTX 650]
  • I am not certain, but I believe that I had this problem occur on nvidia driver versions: 455, 450.66, 450.80.02.

I can confirm that CTRL+ALT+F1 switching to console and back (CTRL+ALT+F7) gets my desktop unfrozen.

Here is the relevant quote from syslog file. Please ask if you need a longer version.

Oct 22 11:17:41 MIXER-desktop kernel: [109687.350444] nvidia-modeset/: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350448] CPU: 3 PID: 923 Comm: nvidia-modeset/ Tainted: P OE 5.4.0-51-generic #56-Ubuntu
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350449] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme4, BIOS P2.60 03/06/2018
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350449] Call Trace:
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350456] dump_stack+0x6d/0x9a
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350460] warn_alloc.cold+0x7b/0xdf
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350462] __alloc_pages_slowpath+0xe07/0xe50
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350464] ? get_page_from_freelist+0x6b/0x390
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350465] __alloc_pages_nodemask+0x2d0/0x320
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350467] alloc_pages_current+0x87/0xe0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350470] kmalloc_order+0x1f/0x80
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350471] kmalloc_order_trace+0x24/0xa0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350472] __kmalloc+0x220/0x280
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350494] nvkms_alloc+0x24/0x60 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350505] _nv002714kms+0x16/0x30 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350514] ? _nv002589kms+0x4e/0x1610 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350524] ? _nv002422kms+0x40/0x40 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350533] ? _nv000550kms+0x365/0x3c0 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350535] ? load_balance+0x199/0xb00
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350541] ? nvkms_memset+0x12/0x20 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350549] ? _nv002676kms+0x309/0x3c0 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350558] ? _nv002694kms+0x29f/0x540 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350569] ? schedule+0x42/0xb0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350570] ? schedule_timeout+0x10e/0x160
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350571] ? __down_interruptible+0x91/0xf0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350580] ? _nv000528kms+0x71/0x80 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350585] ? nvkms_kthread_q_callback+0x81/0xe0 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350589] ? nvkms_kthread_q_callback+0x8a/0xe0 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350594] ? _main_loop+0x8c/0x140 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350596] ? kthread+0x104/0x140
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350600] ? _raw_q_schedule+0x70/0x70 [nvidia_modeset]
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350601] ? kthread_park+0x90/0x90
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350602] ? ret_from_fork+0x35/0x40
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350603] Mem-Info:
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350605] active_anon:2660468 inactive_anon:427104 isolated_anon:0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350605] active_file:1526786 inactive_file:558926 isolated_file:0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350605] unevictable:33 dirty:8752 writeback:3 unstable:0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350605] slab_reclaimable:641208 slab_unreclaimable:79815
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350605] mapped:282480 shmem:176270 pagetables:20555 bounce:0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350605] free:56068 free_pcp:37 free_cma:0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350607] Node 0 active_anon:10641872kB inactive_anon:1708416kB active_file:6107144kB inactive_file:2235704kB unevictable:132kB isolated(anon):0kB isolated(file):0kB mapped:1129920kB dirty:35008kB writeback:12kB shmem:705080kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350607] Node 0 DMA free:15896kB min:40kB low:52kB high:64kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15896kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350609] lowmem_reserve: 0 3354 23896 23896 23896
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350610] Node 0 DMA32 free:109608kB min:9480kB low:12912kB high:16344kB active_anon:1375552kB inactive_anon:132676kB active_file:756172kB inactive_file:187812kB unevictable:0kB writepending:3384kB present:3617328kB managed:3526796kB mlocked:0kB kernel_stack:732kB pagetables:1988kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350611] lowmem_reserve: 0 0 20541 20541 20541
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350612] Node 0 Normal free:98768kB min:58056kB low:79088kB high:100120kB active_anon:9266028kB inactive_anon:1575548kB active_file:5351684kB inactive_file:2048248kB unevictable:132kB writepending:31344kB present:21479424kB managed:21034700kB mlocked:132kB kernel_stack:25156kB pagetables:80232kB bounce:0kB free_pcp:160kB local_pcp:0kB free_cma:0kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350613] lowmem_reserve: 0 0 0 0 0
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350614] Node 0 DMA: 24kB (U) 28kB (U) 216kB (U) 132kB (U) 364kB (U) 2128kB (U) 0256kB 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15896kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350617] Node 0 DMA32: 6528
4kB (UME) 36738kB (UME) 338216kB (UME) 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 109608kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350619] Node 0 Normal: 38604kB (UMEH) 62578kB (UMEH) 202716kB (UMEH) 432kB (UH) 164kB (H) 1128kB (H) 1256kB (H) 1512kB (H) 11024kB (H) 02048kB 0*4096kB = 100040kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350623] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350624] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350624] 2260780 total pagecache pages
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350629] 1350 pages in swap cache
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350629] Swap cache stats: add 194665, delete 193316, find 292409/298008
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350630] Free swap = 30075120kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350630] Total swap = 30719996kB
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350630] 6278185 pages RAM
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350631] 0 pages HighMem/MovableOnly
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350631] 133837 pages reserved
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350631] 0 pages cma reserved
Oct 22 11:17:41 MIXER-desktop kernel: [109687.350631] 0 pages hwpoisoned

fsad

I did a test and can confirm this. While copying large numbers of files from one thumbdrive to another, the faults occur much more frequently.

Confirm exactly the same problems:

  • triggers more often during large file transfers, but also during more quiet uses like following a Youtube video and typing in a code editor;

  • can be unfrozen by switching to console 1 and back to X11/Xorg

    [1195345.237478] nvidia-modeset/: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
    [1195345.237489] CPU: 0 PID: 656 Comm: nvidia-modeset/ Tainted: P OE 5.4.70-1-lts #1
    [1195345.237491] Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 2606 08/13/2020
    [1195345.237493] Call Trace:
    [1195345.237502] dump_stack+0x64/0x88
    [1195345.237506] warn_alloc.cold+0x78/0xdc
    [1195345.237509] ? __alloc_pages_direct_compact+0x168/0x170
    [1195345.237511] __alloc_pages_slowpath+0xd3c/0xd70
    [1195345.237536] ? _nv000151kms+0x7e0/0x7e0 [nvidia_modeset]
    [1195345.237539] __alloc_pages_nodemask+0x2d5/0x310
    [1195345.237542] kmalloc_order+0x1b/0x80
    [1195345.237545] kmalloc_order_trace+0x1d/0xa0
    [1195345.237563] nvkms_alloc+0x20/0x50 [nvidia_modeset]
    [1195345.237587] _nv002714kms+0x16/0x30 [nvidia_modeset]
    [1195345.237608] ? _nv002589kms+0x4e/0x1610 [nvidia_modeset]
    [1195345.237630] ? _nv000550kms+0x365/0x3c0 [nvidia_modeset]
    [1195345.237656] ? _nv002676kms+0x309/0x3c0 [nvidia_modeset]
    [1195345.237677] ? _nv002694kms+0x29f/0x540 [nvidia_modeset]
    [1195345.237681] ? schedule+0x39/0xa0
    [1195345.237684] ? schedule_timeout+0x111/0x150
    [1195345.237686] ? __down_interruptible+0x9c/0x100
    [1195345.237707] ? _nv000528kms+0x71/0x80 [nvidia_modeset]
    [1195345.237725] ? nvkms_kthread_q_callback+0x7c/0xd0 [nvidia_modeset]
    [1195345.237743] ? _main_loop+0x83/0x130 [nvidia_modeset]
    [1195345.237761] ? nvkms_sema_up+0x10/0x10 [nvidia_modeset]
    [1195345.237764] ? kthread+0x117/0x130
    [1195345.237766] ? __kthread_bind_mask+0x60/0x60
    [1195345.237768] ? ret_from_fork+0x22/0x40
    [1195345.237770] Mem-Info:
    [1195345.237776] active_anon:11174116 inactive_anon:699281 isolated_anon:0
    active_file:1189778 inactive_file:2625572 isolated_file:0
    unevictable:8 dirty:721344 writeback:2920 unstable:0
    slab_reclaimable:250630 slab_unreclaimable:167512
    mapped:511550 shmem:1007525 pagetables:55748 bounce:0
    free:165299 free_pcp:450 free_cma:0
    [1195345.237781] Node 0 active_anon:44696464kB inactive_anon:2797124kB active_file:4759112kB inactive_file:10502288kB unevictable:32kB isolated(anon):0kB isolated(file):0kB mapped:2046200kB dirty:2885376kB writeback:11680kB shmem:4030100kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1832960kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
    [1195345.237784] Node 0 DMA free:15892kB min:16kB low:28kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15896kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
    [1195345.237788] lowmem_reserve: 0 2645 64197 64197 64197
    [1195345.237791] Node 0 DMA32 free:248988kB min:2784kB low:5492kB high:8200kB active_anon:572808kB inactive_anon:200568kB active_file:185980kB inactive_file:1396468kB unevictable:0kB writepending:506776kB present:2788656kB managed:2788648kB mlocked:0kB kernel_stack:808kB pagetables:2580kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
    [1195345.237796] lowmem_reserve: 0 0 61551 61551 61551
    [1195345.237798] Node 0 Normal free:396316kB min:322828kB low:385856kB high:448884kB active_anon:44123964kB inactive_anon:2596556kB active_file:4573372kB inactive_file:9106508kB unevictable:32kB writepending:2391280kB present:64211968kB managed:63034968kB mlocked:32kB kernel_stack:46888kB pagetables:220412kB bounce:0kB free_pcp:2048kB local_pcp:0kB free_cma:0kB
    [1195345.237802] lowmem_reserve: 0 0 0 0 0
    [1195345.237805] Node 0 DMA: 14kB (U) 28kB (U) 216kB (U) 132kB (U) 164kB (U) 1128kB (U) 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15892kB
    [1195345.237813] Node 0 DMA32: 4146
    4kB (UME) 44018kB (UME) 224816kB (UME) 154332kB (UME) 91064kB (UME) 307128kB (UME) 60256kB (UME) 0512kB 01024kB 02048kB 04096kB = 250032kB
    [1195345.237820] Node 0 Normal: 690424kB (UMEH) 25478kB (UMEH) 385516kB (UMEH) 130832kB (UMEH) 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 400080kB
    [1195345.237828] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
    [1195345.237830] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    [1195345.237831] 4820754 total pagecache pages
    [1195345.237835] 175 pages in swap cache
    [1195345.237836] Swap cache stats: add 31034, delete 30859, find 4984/5807
    [1195345.237837] Free swap = 9920252kB
    [1195345.237838] Total swap = 10000380kB
    [1195345.237839] 16754154 pages RAM
    [1195345.237840] 0 pages HighMem/MovableOnly
    [1195345.237841] 294276 pages reserved
    [1195345.237842] 0 pages hwpoisoned
    [1195359.710017] snd_hda_codec_hdmi hdaudioC1D0: HDMI: invalid ELD data byte 16

  • my setup
    [jaap@jaap ~ ]$ nvidia-smi -L
    GPU 0: GeForce RTX 2070 SUPER (UUID: GPU-395ce04f-f18a-d842-fb69-808764409ddf)
    GPU 1: GeForce GTX 1650 (UUID: GPU-d5fff140-5b8b-9f86-099e-b6cdcc7fc8f8)
    Displays connect to GPU 1.

Oh and BTW NVIDIA please opensource your drivers so those kind of issues can be prevented by the power of the crowd because this is not the first timeā€¦

1 Like

I have the same problem, see my forum post.

Anyone has an idea when the update that fixed this bug will be released?

I havenā€™t been able to reproduce yet on the 450 drive series so that seems to be the way to go until Nvidia can release a patch.

I believe iā€™ve found a way to reproduce this bug

With nothing but many .jpeg images in a directory, run the command :

gm convert * output.pdf

This command belongs to package graphicsmagick on Arch, and it combines input images (in this case all of them - *) into a PDF file. If there are enough images in your directory, this command should reproduce the bug, freeze the screen, and output :

gm convert: abort due to signal 7 (SIGBUS) ā€œBus Errorā€ā€¦
Aborted (core dumped)

If command succeeds, and produces the PDF, just add more images to the directory (preferably higher resolution ?) and try again. At one point, the bug should reproduce. After I reach a certain threshold number of images, the bug happens every time I run the command. The threshold for me was around 200 images of various sizes i quickly mixed together to cause the bug to happen.

My resulting log, caused by the above command :

Oct 24 22:15:34 userABC kernel: nvidia-modeset/: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
Oct 24 22:15:35 userABC kernel: CPU: 7 PID: 169 Comm: nvidia-modeset/ Tainted: P OE 5.4.72-1-lts #1
Oct 24 22:15:35 userABC kernel: Call Trace:
Oct 24 22:15:35 userABC kernel: dump_stack+0x64/0x88
Oct 24 22:15:35 userABC kernel: warn_alloc.cold+0x78/0xdc
Oct 24 22:15:35 userABC kernel: ? __alloc_pages_direct_compact+0x168/0x170
Oct 24 22:15:35 userABC kernel: __alloc_pages_slowpath+0xd3c/0xd70
Oct 24 22:15:35 userABC kernel: __alloc_pages_nodemask+0x2d5/0x310
Oct 24 22:15:35 userABC kernel: kmalloc_order+0x1b/0x80
Oct 24 22:15:35 userABC kernel: kmalloc_order_trace+0x1d/0xa0
Oct 24 22:15:35 userABC kernel: nvkms_alloc+0x20/0x50 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: _nv002714kms+0x16/0x30 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? _nv002589kms+0x4e/0x1610 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? _nv002422kms+0x40/0x40 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? _nv000550kms+0x365/0x3c0 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? _nv002676kms+0x309/0x3c0 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? _nv002694kms+0x29f/0x540 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? schedule+0x39/0xa0
Oct 24 22:15:35 userABC kernel: ? schedule_timeout+0x111/0x150
Oct 24 22:15:35 userABC kernel: ? __down_interruptible+0x9c/0x100
Oct 24 22:15:35 userABC kernel: ? _nv000528kms+0x71/0x80 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? nvkms_kthread_q_callback+0x7c/0xd0 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? _main_loop+0x83/0x130 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? nvkms_sema_up+0x10/0x10 [nvidia_modeset]
Oct 24 22:15:35 userABC kernel: ? kthread+0x117/0x130
Oct 24 22:15:35 userABC kernel: ? __kthread_bind_mask+0x60/0x60
Oct 24 22:15:35 userABC kernel: ? ret_from_fork+0x35/0x40
Oct 24 22:15:35 userABC kernel: Mem-Info:
Oct 24 22:15:35 userABC kernel: active_anon:1519899 inactive_anon:218618 isolated_anon:32
active_file:1263 inactive_file:946 isolated_file:0
unevictable:2 dirty:0 writeback:563 unstable:0
slab_reclaimable:14340 slab_unreclaimable:28123
mapped:546399 shmem:488812 pagetables:6534 bounce:0
free:25406 free_pcp:173 free_cma:0
Oct 24 22:15:35 userABC kernel: Node 0 active_anon:6079596kB inactive_anon:874472kB active_file:5052kB inactive_file:3784kB unevictable:8kB isolated(anon):128kB isolated(file):0kB mapped:2185596kB dirty:0kB writeback:2252kB shmem:1955248kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Oct 24 22:15:35 userABC kernel: Node 0 DMA free:15884kB min:140kB low:172kB high:204kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15892kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct 24 22:15:35 userABC kernel: lowmem_reserve: 0 2669 7357 7357 7357
Oct 24 22:15:35 userABC kernel: Node 0 DMA32 free:43144kB min:24472kB low:30588kB high:36704kB active_anon:2550992kB inactive_anon:8kB active_file:8kB inactive_file:24kB unevictable:0kB writepending:0kB present:2830268kB managed:2764732kB mlocked:0kB kernel_stack:368kB pagetables:6664kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Oct 24 22:15:35 userABC kernel: lowmem_reserve: 0 0 4687 4687 4687
Oct 24 22:15:35 userABC kernel: Node 0 Normal free:42596kB min:42968kB low:53708kB high:64448kB active_anon:3528656kB inactive_anon:874760kB active_file:4672kB inactive_file:3908kB unevictable:8kB writepending:1592kB present:4966400kB managed:4806176kB mlocked:0kB kernel_stack:4528kB pagetables:19472kB bounce:0kB free_pcp:692kB local_pcp:692kB free_cma:0kB
Oct 24 22:15:35 userABC kernel: lowmem_reserve: 0 0 0 0 0
Oct 24 22:15:35 userABC kernel: Node 0 DMA: 14kB (U) 38kB (U) 116kB (U) 132kB (U) 364kB (U) 2128kB (U) 0256kB 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15884kB
Oct 24 22:15:35 userABC kernel: Node 0 DMA32: 342
4kB (UME) 5708kB (UME) 56016kB (UME) 36932kB (UME) 16764kB (UME) 45128kB (UME) 0256kB 0512kB 01024kB 02048kB 04096kB = 43144kB
Oct 24 22:15:35 userABC kernel: Node 0 Normal: 14474kB (UME) 11618kB (UME) 49716kB (UME) 36432kB (UME) 7964kB (UM) 1128kB (M) 0256kB 7512kB (M) 01024kB 02048kB 0*4096kB = 43444kB
Oct 24 22:15:35 userABC kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 24 22:15:35 userABC kernel: 498803 total pagecache pages
Oct 24 22:15:35 userABC kernel: 7439 pages in swap cache
Oct 24 22:15:35 userABC kernel: Swap cache stats: add 2604868, delete 2597580, find 649408/764049
Oct 24 22:15:35 userABC kernel: Free swap = 13808380kB
Oct 24 22:15:35 userABC kernel: Total swap = 16777212kB
Oct 24 22:15:35 userABC kernel: 1953165 pages RAM
Oct 24 22:15:35 userABC kernel: 0 pages HighMem/MovableOnly
Oct 24 22:15:35 userABC kernel: 56465 pages reserved
Oct 24 22:15:35 userABC kernel: 0 pages hwpoisoned

Are there any plans to backport CUDA 11.1 support to the stable 450 series? We donā€™t actually have any stable CUDA setup with the rtx 3 series right now

And all this time I was beginning to think my card or motherboard was damaged by recent power spikes. Iā€™ve been running memory checkers like crazy!

I have a GTX 1060 3GB (by MSI) card running on a fully patched Fedora 32 (kernel-5.8.15-201.fc32.x86_64). Frequently, but not every time, after Gnome has been locked, a kerneloops ā€œBugā€ occurs in the kernel according ABRT, but there is not enough information for a backtrace.

My screen will lock as is (i.e. off or with whatever screen image was on at the time), the keyboard and mouse are no longer responsive at all (I canā€™t even CTRL-ALT-F4 to switch to a CLI console). CAPS and NUM lock donā€™t work.

I can ssh into the PC and itā€™s otherwise fully operational. All server like services are working 100%.

It did it when I was running driver NVIDIA-Linux-x86_64-450.66, so I upgraded to NVIDIA-Linux-x86_64-455.28 and it did it again.

The ā€œBUGā€ listed in the crash output refers to [nvidia_modeset],

See my post on it: https://forums.developer.nvidia.com/t/bug-on-fedora-32-and-gtx-1060-driver-450-and-455/157320

1 Like

Of course not. I donā€™t see why a kernel update would even help as itā€™s clearly a problem with the Nvidia driver that showed up after the update to 455. I also reported above that I have those problems with 5.9.x

Anyways, the Nvidia people said they fixed it, so I guess all we can do is hope they really fixed it and push it asap.

Did they say that? I see no mention of the fix in the new 455.38 release notes and wouldnā€™t expect another release for a month.

Do you know if the fix made it into the 455 release series driver today?

1 Like

I suggest you read the posts in this thread, especially if theyā€™re more or less right above your own.

No idea how my post from 20 hours ago says anything about the release from 2 hours ago. Iā€™m afraid I canā€™t see into the future. Thus, no idea if the fix went into this release.

It didnā€™t.

Thanks for the update! Would you please share a timeline/schedule on when is the fix going to be rlease? and which branch/driver?

1 Like

I think I uploaded nvidia 455 bug log in wrong thread, here is link if devs need it.

I canā€™t comment on future release schedules, sorry.

To be clear, any driver releases without this and the other (Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference) problems fixed are basically useless for us. We canā€™t be running a crashing driver. Please, release the fixed driver ASAP.

2 Likes

@aplattner Does the 455.34.01 driver contain a fix?

1 Like

I can confirm that 455.38 still contains this bug.

I was running 450.80.02 2020.9.30 and I upgraded to 455.38 2020.10.29 driver and I have this exact issue as well with both, but it is almost immediate. It happens upon login, and switching VTs to the text console and back to X will get things rolling for a few seconds, then it locks again. After a couple of times switching, X hangs to the point that most keyboard control is lost, and the only recovery is SysRq b. This turned my workhorse workstation into a space heater that is taking all of my desk space.