Swiotlb full; prevents copying of vbios: gentoo 5.4.6 / driver 450.66

When I boot this system, the drivers fail to load, evidently because the swiotlb is full. I don’t really understand the dmesg output, but it looks like the buffer shouldn’t actually be full (or otherwise the requested space is too large?)

Reading similar issues, I’ve tried booting with a variety of kernel params, including iommu=off, iommu=force, iommu=soft, and increasing the swiotlb to a much larger size. None seemed to have any effect.

Not sure if relevant, but this system has two NVMe SSDs - one contains the EFI system partition and gentoo’s root partition, which uses btrfs. The other SSD contains a Windows installation which I can boot into using GRUB. I’ve been using the Windows installation without issue for months, and this GPU works great on that side - the second SSD is a recent addition.

dmesg output:

[    3.982791] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[    3.982838] nvidia 0000:10:00.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 0 (slots)
[    3.982841] nvidia 0000:10:00.0: overflow 0x00008007eac00000+327680 of DMA mask 7fffffffffff bus mask 0
[    3.982845] ------------[ cut here ]------------
[    3.982850] WARNING: CPU: 13 PID: 910 at kernel/dma/direct.c:35 report_addr+0x2e/0x50
[    3.982850] Modules linked in: nvidia_modeset(PO+) nvidia(PO) efivarfs
[    3.982855] CPU: 13 PID: 910 Comm: nvidia-smi Tainted: P           O      5.4.60-gentoo #4
[    3.982856] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Steel Legend WiFi ax, BIOS P1.90 09/10/2019
[    3.982858] RIP: 0010:report_addr+0x2e/0x50
[    3.982860] Code: 48 8b 87 28 02 00 00 48 89 34 24 48 85 c0 74 2d 4c 8b 00 b8 fe ff ff ff 49 39 c0 76 14 80 3d 14 b6 40 01 00 0f 84 25 07 00 00 <0f> 0b 48 83 c4 08 c3 48 83 bf 38 02 00 00 00 74 ef eb e0 80 3d f5
[    3.982861] RSP: 0018:ffffa5fe8049f898 EFLAGS: 00010246
[    3.982862] RAX: 0000000000000000 RBX: ffff9a42b93fb800 RCX: 0000000000000000
[    3.982862] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffffa8ac54ac
[    3.982863] RBP: 0000000000000050 R08: 0000000000000001 R09: 00000000000003d8
[    3.982864] R10: 00000000000155c0 R11: 0000000000000001 R12: 0000000000050000
[    3.982865] R13: ffff9a42b9f295c8 R14: 0000000000000001 R15: ffff9a42b9f29630
[    3.982866] FS:  00007f4326a01b80(0000) GS:ffff9a42beb40000(0000) knlGS:0000000000000000
[    3.982867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.982868] CR2: 000055fbe2ed54b0 CR3: 00008007f797e000 CR4: 0000000000340ee0
[    3.982868] Call Trace:
[    3.982872]  dma_direct_map_page+0xdd/0xf0
[    3.983019]  nv_dma_map_pages+0x184/0x3f0 [nvidia]
[    3.983162]  nv_dma_map_alloc+0xd9/0x270 [nvidia]
[    3.983314]  _nv030433rm+0x357/0x480 [nvidia]
[    3.983442]  ? _nv025145rm+0x2b5/0x470 [nvidia]
[    3.983564]  ? _nv026175rm+0x76/0x270 [nvidia]
[    3.983683]  ? _nv026133rm+0x381/0x430 [nvidia]
[    3.983801]  ? _nv026126rm+0xd8/0x610 [nvidia]
[    3.983914]  ? _nv037389rm+0x104/0x180 [nvidia]
[    3.984022]  ? _nv037434rm+0x93a/0x11d0 [nvidia]
[    3.984089]  ? _nv000738rm+0xd06/0x2030 [nvidia]
[    3.984156]  ? rm_init_adapter+0xc5/0xe0 [nvidia]
[    3.984221]  ? nv_request_soc_irq+0x200/0xe60 [nvidia]
[    3.984223]  ? _cond_resched+0x10/0x20
[    3.984287]  ? nv_request_soc_irq+0xc02/0xe60 [nvidia]
[    3.984289]  ? exact_lock+0x8/0x20
[    3.984353]  ? nvidia_frontend_open+0x4e/0x90 [nvidia]
[    3.984353]  ? chrdev_open+0x98/0x1a0
[    3.984354]  ? cdev_put.part.0+0x20/0x20
[    3.984355]  ? do_dentry_open+0x137/0x380
[    3.984357]  ? path_openat+0x58c/0x1560
[    3.984359]  ? security_capable+0x31/0x50
[    3.984361]  ? capable_wrt_inode_uidgid+0x12/0x30
[    3.984362]  ? do_filp_open+0x8c/0x100
[    3.984363]  ? chown_common.isra.0+0x9a/0x150
[    3.984364]  ? do_sys_open+0x17f/0x220
[    3.984365]  ? do_syscall_64+0x43/0x110
[    3.984366]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    3.984367] ---[ end trace bcfec63070b43e43 ]---
[    3.984405] NVRM: GPU 0000:10:00.0: Failed to copy vbios to system memory.
[    3.984541] NVRM: GPU 0000:10:00.0: RmInitAdapter failed! (0x30:0xffff:794)
[    3.984560] NVRM: GPU 0000:10:00.0: rm_init_adapter failed, device minor number 0

CPU & Kernel:

$ uname -a
Linux 5.4.60-gentoo #4 SMP Thu Sep 17 18:16:51 EDT 2020 x86_64 AMD Ryzen 7 3800X 8-Core Processor AuthenticAMD GNU/Linux

Motherboard:

$ cat /sys/devices/virtual/dmi/id/board_{vendor,name}
ASRock
X570 Steel Legend WiFi ax

Nvidia hardware, including GPU:

$ lspci | grep -i nvidia
10:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] (rev a1)
10:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
10:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
10:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)

Driver package settings:

x11-drivers/nvidia-drivers-450.66::gentoo was built with the following:
USE="X driver kms (libglvnd) multilib tools -compat -gtk3 -static-libs -uvm -wayland" ABI_X86="32 (64) (-x32)"

nvidia-bug-report.log.gz (514.1 KB)

OK, some progress: turns out IOMMU was turned off in EFI settings, so I enabled it and the driver loads. Now startx runs, but after a few seconds it fails and returns me to a terminal with fonts displaying in a warped and almost unreadable way. The Xorg log says:

[    95.299] (EE) NVIDIA(GPU-0): Failed to initialize DMA.
[    95.299] (EE) NVIDIA(0): Failed to allocate push buffer
[    95.477] (EE) Fatal server error:
[    95.477] (EE) AddScreen/ScreenInit failed for driver 0

Attaching new bug report log and Xorg log.

Xorg.0.log (9.1 KB)
nvidia-bug-report.log.gz (1.0 MB)

Looks the culprit was ultimately AMD Secure Memory Encryption. Turned it off in the kernel and now I’m able to startx.

This can be closed as a duplicate of Unable to start X. Failed to initialize DMA. - #3 by kon14.