Aye, once the DX12 issue is fully resolved only two (?) more critical problems remain:
opened 06:28PM - 25 Feb 25 UTC
I'm using kernel 6.13.4 with nvidia-open 570.86.16 and KDE 6.3.1.
I have a simp… le Wayland application that does a standard Vulkan render loop, drawing barely anything on the screen. You can see this on the profile capture below, with the steady red ticks being the Vulkan render tasks, and the corresponding green ticks representing render submissions on the CPU. The pink regions are for the texture load thread, and the yellow regions are for the Vulkan object reaper thread.
<img alt="Image" src="https://github.com/user-attachments/assets/cda481b1-0496-415f-83eb-b385f4b469c6" />
When I create or destroy a Vulkan buffer of a considerable size (1 GB in this case), render loop becomes blocked, which results in a visible hitching on the display. This can be seen in frames 202, 203, and 225 or 226.
Buffer allocations are performed using VMA 3.2.1, with `VMA_MEMORY_USAGE_AUTO_PREFER_HOST` usage and `VMA_ALLOCATION_CREATE_MAPPED_BIT | VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT` flags.
Here's a closeup of frames 202 and 203, with visible kernel stacks for buffer creation:
<img alt="Image" src="https://github.com/user-attachments/assets/69394f17-bb73-4ba5-8113-9dd36c228624" />
Example stacks for buffer creation in frame 202:
```
0. clear_page_erms (<kernel>) <kernel>
1. prep_new_page (<kernel>) <kernel>
2. get_page_from_freelist (<kernel>) <kernel>
3. __alloc_pages_noprof (<kernel>) <kernel>
4. alloc_pages_mpol_noprof (<kernel>) <kernel>
5. get_free_pages_noprof (<kernel>) <kernel>
6. nv_alloc_system_pages (<kernel>) nvidia
7. nv_alloc_pages (<kernel>) nvidia
8. osAllocPagesInternal (<kernel>) nvidia
9. memdescAlloc (<kernel>) nvidia
10. sysmemConstruct_IMPL (<kernel>) nvidia
11. __nvoc_objCreate_SystemMemory (<kernel>) nvidia
12. __nvoc_objCreateDynamic (<kernel>) nvidia
13. resservResourceFactory (<kernel>) nvidia
14. _clientAllocResourceHelper (<kernel>) nvidia
15. serverAllocResourceUnderLock (<kernel>) nvidia
16. serverAllocResource (<kernel>) nvidia
17. rmapiAllocWithSecInfo (<kernel>) nvidia
18. rmapiAllocWithSecInfoTls (<kernel>) nvidia
19. _rmAllocForDeprecatedApi (<kernel>) nvidia
20. _rmVidHeapControlAllocCommon (<kernel>) nvidia
21. _nvos32FunctionAllocSize (<kernel>) nvidia
22. RmDeprecatedVidHeapControl (<kernel>) nvidia
23. Nv04VidHeapControlWithSecInfo (<kernel>) nvidia
24. RmIoctl (<kernel>) nvidia
25. rm_ioctl (<kernel>) nvidia
26. nvidia_unlocked_ioctl (<kernel>) nvidia
27. __x64_sys_ioctl (<kernel>) <kernel>
28. do_syscall_64 (<kernel>) <kernel>
29. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
30. ioctl + 61 ([unknown]) /usr/lib/libc.so.6
31. [unknown] + 129151784076160 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
```
0. find_next_iomem_res (<kernel>) <kernel>
1. walk_system_ram_range (<kernel>) <kernel>
2. pat_pagerange_is_ram (<kernel>) <kernel>
3. memtype_reserve (<kernel>) <kernel>
4. _set_pages_array (<kernel>) <kernel>
5. nv_alloc_system_pages (<kernel>) nvidia
6. nv_alloc_pages (<kernel>) nvidia
7. osAllocPagesInternal (<kernel>) nvidia
8. memdescAlloc (<kernel>) nvidia
9. sysmemConstruct_IMPL (<kernel>) nvidia
10. __nvoc_objCreate_SystemMemory (<kernel>) nvidia
11. __nvoc_objCreateDynamic (<kernel>) nvidia
12. resservResourceFactory (<kernel>) nvidia
13. _clientAllocResourceHelper (<kernel>) nvidia
14. serverAllocResourceUnderLock (<kernel>) nvidia
15. serverAllocResource (<kernel>) nvidia
16. rmapiAllocWithSecInfo (<kernel>) nvidia
17. rmapiAllocWithSecInfoTls (<kernel>) nvidia
18. _rmAllocForDeprecatedApi (<kernel>) nvidia
19. _rmVidHeapControlAllocCommon (<kernel>) nvidia
20. _nvos32FunctionAllocSize (<kernel>) nvidia
21. RmDeprecatedVidHeapControl (<kernel>) nvidia
22. Nv04VidHeapControlWithSecInfo (<kernel>) nvidia
23. RmIoctl (<kernel>) nvidia
24. rm_ioctl (<kernel>) nvidia
25. nvidia_unlocked_ioctl (<kernel>) nvidia
26. __x64_sys_ioctl (<kernel>) <kernel>
27. do_syscall_64 (<kernel>) <kernel>
28. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
29. ioctl + 61 ([unknown]) /usr/lib/libc.so.6
30. [unknown] + 129151784076160 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
Stack of render thread in frame 202:
```
0. __schedule (<kernel>) <kernel>
1. __schedule (<kernel>) <kernel>
2. schedule (<kernel>) <kernel>
3. schedule_hrtimeout_range (<kernel>) <kernel>
4. do_sys_poll (<kernel>) <kernel>
5. __x64_sys_poll (<kernel>) <kernel>
6. do_syscall_64 (<kernel>) <kernel>
7. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
8. [unknown] + 129152391494626 ([unknown]) /usr/lib/libc.so.6
9. [unknown] + 129152391446132 ([unknown]) /usr/lib/libc.so.6
10. poll + 30 ([unknown]) /usr/lib/libc.so.6
11. wl_display_dispatch_queue + 262 ([unknown]) /usr/lib/libwayland-client.so.0
...
```
Example stacks for buffer creation in frame 203:
```
0. nvidia_mmap_helper (<kernel>) nvidia
1. __mmap_region (<kernel>) <kernel>
2. mmap_region (<kernel>) <kernel>
3. do_mmap (<kernel>) <kernel>
4. vm_mmap_pgoff (<kernel>) <kernel>
5. ksys_mmap_pgoff (<kernel>) <kernel>
6. do_syscall_64 (<kernel>) <kernel>
7. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
8. __mmap + 44 ([unknown]) /usr/lib/libc.so.6
9. [unknown] + 129151785104939 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
```
0. kbusUseDirectSysmemMap_GA100 (<kernel>) nvidia
1. memdescGetMapInternalType (<kernel>) nvidia
2. memdescMapInternal (<kernel>) nvidia
3. memmgrMemBeginTransfer_IMPL (<kernel>) nvidia
4. memmgrMemWriteWithTransferType (<kernel>) nvidia
5. memmgrMemWrite_IMPL (<kernel>) nvidia
6. _gmmuWalkCBUpdatePde (<kernel>) nvidia
7. _mmuWalkPdeAcquire (<kernel>) nvidia
8. mmuWalkProcessPdes (<kernel>) nvidia
9. mmuWalkReserveEntries (<kernel>) nvidia
10. gvaspaceAlloc_IMPL (<kernel>) nvidia
11. virtmemAllocResources (<kernel>) nvidia
12. virtmemConstruct_IMPL (<kernel>) nvidia
13. __nvoc_objCreate_VirtualMemory (<kernel>) nvidia
14. __nvoc_objCreateDynamic (<kernel>) nvidia
15. resservResourceFactory (<kernel>) nvidia
16. _clientAllocResourceHelper (<kernel>) nvidia
17. serverAllocResourceUnderLock (<kernel>) nvidia
18. serverAllocResource (<kernel>) nvidia
19. rmapiAllocWithSecInfo (<kernel>) nvidia
20. rmapiAllocWithSecInfoTls (<kernel>) nvidia
21. _rmAllocForDeprecatedApi (<kernel>) nvidia
22. _rmVidHeapControlAllocCommon (<kernel>) nvidia
23. _nvos32FunctionAllocSize (<kernel>) nvidia
24. RmDeprecatedVidHeapControl (<kernel>) nvidia
25. Nv04VidHeapControlWithSecInfo (<kernel>) nvidia
26. RmIoctl (<kernel>) nvidia
27. rm_ioctl (<kernel>) nvidia
28. nvidia_unlocked_ioctl (<kernel>) nvidia
29. __x64_sys_ioctl (<kernel>) <kernel>
30. do_syscall_64 (<kernel>) <kernel>
31. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
32. ioctl + 61 ([unknown]) /usr/lib/libc.so.6
33. [unknown] + 129151784076160 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
```
0. objDynamicCastById_IMPL (<kernel>) nvidia
1. objFindAncestor_IMPL (<kernel>) nvidia
2. kgmmuEncodeSysmemAddrs_GM107 (<kernel>) nvidia
3. kgmmuEncodePhysAddr_IMPL (<kernel>) nvidia
4. _gmmuWalkCBMapNextEntries_Direct.isra.0 (<kernel>) nvidia
5. _gmmuWalkCBMapNextEntries_RmAperture (<kernel>) nvidia
6. _mmuWalkMap (<kernel>) nvidia
7. mmuWalkProcessPdes (<kernel>) nvidia
8. mmuWalkMap (<kernel>) nvidia
9. gvaspaceMap_IMPL (<kernel>) nvidia
10. dmaUpdateVASpace_GF100 (<kernel>) nvidia
11. dmaAllocMapping_GM107 (<kernel>) nvidia
12. dmaAllocMap_IMPL (<kernel>) nvidia
13. virtmemMapTo_IMPL (<kernel>) nvidia
14. rmclientInterMap_IMPL (<kernel>) nvidia
15. serverInterMap (<kernel>) nvidia
16. rmapiMapWithSecInfo (<kernel>) nvidia
17. rmapiMapWithSecInfoTls (<kernel>) nvidia
18. Nv04MapMemoryDmaWithSecInfo (<kernel>) nvidia
19. RmIoctl (<kernel>) nvidia
20. rm_ioctl (<kernel>) nvidia
21. nvidia_unlocked_ioctl (<kernel>) nvidia
22. __x64_sys_ioctl (<kernel>) <kernel>
23. do_syscall_64 (<kernel>) <kernel>
24. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
25. ioctl + 61 ([unknown]) /usr/lib/libc.so.6
26. [unknown] + 129151784076160 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
Stack of render thread in frame 203:
```
0. __schedule (<kernel>) <kernel>
1. __schedule (<kernel>) <kernel>
2. schedule (<kernel>) <kernel>
3. futex_wait_queue (<kernel>) <kernel>
4. __futex_wait (<kernel>) <kernel>
5. futex_wait (<kernel>) <kernel>
6. do_futex (<kernel>) <kernel>
7. __x64_sys_futex (<kernel>) <kernel>
8. do_syscall_64 (<kernel>) <kernel>
9. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
10. [unknown] + 129152391448304 ([unknown]) /usr/lib/libc.so.6
11. [unknown] + 129151778516016 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
Example stacks for buffer destruction in frames 225 / 226:
```
0. kbusFlushSingle_GV100 (<kernel>) nvidia
1. kbusUpdateRmAperture_GM107 (<kernel>) nvidia
2. kbusMapBar2Aperture_VBAR2 (<kernel>) nvidia
3. memdescMapInternal (<kernel>) nvidia
4. memmgrMemBeginTransfer_IMPL (<kernel>) nvidia
5. _gmmuWalkCBFillEntries (<kernel>) nvidia
6. mmuWalkFill (<kernel>) nvidia
7. mmuWalkProcessPdes (<kernel>) nvidia
8. mmuWalkUnmap (<kernel>) nvidia
9. gvaspaceUnmap_IMPL (<kernel>) nvidia
10. dmaUpdateVASpace_GF100 (<kernel>) nvidia
11. dmaFreeMapping_GM107 (<kernel>) nvidia
12. dmaFreeMap_IMPL (<kernel>) nvidia
13. virtmemUnmapFrom_IMPL (<kernel>) nvidia
14. rmclientInterUnmap_IMPL (<kernel>) nvidia
15. serverInterUnmap (<kernel>) nvidia
16. rmapiUnmapWithSecInfo (<kernel>) nvidia
17. rmapiUnmapWithSecInfoTls (<kernel>) nvidia
18. Nv04UnmapMemoryDmaWithSecInfo (<kernel>) nvidia
19. RmIoctl (<kernel>) nvidia
20. rm_ioctl (<kernel>) nvidia
21. nvidia_unlocked_ioctl (<kernel>) nvidia
22. __x64_sys_ioctl (<kernel>) <kernel>
23. do_syscall_64 (<kernel>) <kernel>
24. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
25. ioctl + 61 ([unknown]) /usr/lib/libc.so.6
26. [unknown] + 129151784076160 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
```
0. osDevReadReg032 (<kernel>) nvidia
1. _regRead (<kernel>) nvidia
2. kgmmuCheckPendingInvalidates_TU102 (<kernel>) nvidia
3. kgmmuInvalidateTlb_GM107 (<kernel>) nvidia
4. kbusUpdateRmAperture_GM107 (<kernel>) nvidia
5. kbusMapBar2Aperture_VBAR2 (<kernel>) nvidia
6. memdescMapInternal (<kernel>) nvidia
7. memmgrMemBeginTransfer_IMPL (<kernel>) nvidia
8. _gmmuWalkCBFillEntries (<kernel>) nvidia
9. mmuWalkFill (<kernel>) nvidia
10. mmuWalkProcessPdes (<kernel>) nvidia
11. mmuWalkUnmap (<kernel>) nvidia
12. _gvaspaceReleaseUnreservedPTEs (<kernel>) nvidia
13. _gvaspaceInternalFree.isra.0 (<kernel>) nvidia
14. memmgrFree_IMPL (<kernel>) nvidia
15. virtmemDestruct_IMPL (<kernel>) nvidia
16. __nvoc_dtor_VirtualMemory (<kernel>) nvidia
17. __nvoc_objDelete (<kernel>) nvidia
18. clientFreeResource_IMPL (<kernel>) nvidia
19. rmclientFreeResource_IMPL (<kernel>) nvidia
20. serverFreeResourceTreeUnderLock (<kernel>) nvidia
21. serverFreeResourceTree (<kernel>) nvidia
22. rmapiFreeWithSecInfo (<kernel>) nvidia
23. rmapiFreeWithSecInfoTls (<kernel>) nvidia
24. Nv01FreeWithSecInfo (<kernel>) nvidia
25. RmIoctl (<kernel>) nvidia
26. rm_ioctl (<kernel>) nvidia
27. nvidia_unlocked_ioctl (<kernel>) nvidia
28. __x64_sys_ioctl (<kernel>) <kernel>
29. do_syscall_64 (<kernel>) <kernel>
30. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
31. ioctl + 61 ([unknown]) /usr/lib/libc.so.6
32. [unknown] + 129151784076160 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
```
0. folios_put_refs (<kernel>) <kernel>
1. free_pages_and_swap_cache (<kernel>) <kernel>
2. __tlb_batch_free_encoded_pages (<kernel>) <kernel>
3. tlb_flush_mmu (<kernel>) <kernel>
4. unmap_page_range (<kernel>) <kernel>
5. unmap_vmas (<kernel>) <kernel>
6. vms_clear_ptes (<kernel>) <kernel>
7. vms_complete_munmap_vmas (<kernel>) <kernel>
8. do_vmi_align_munmap (<kernel>) <kernel>
9. do_vmi_munmap (<kernel>) <kernel>
10. __vm_munmap (<kernel>) <kernel>
11. __x64_sys_munmap (<kernel>) <kernel>
12. do_syscall_64 (<kernel>) <kernel>
13. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
14. __munmap + 11 ([unknown]) /usr/lib/libc.so.6
15. [unknown] ([unknown]) [unknown]
```
```
0. find_next_iomem_res (<kernel>) <kernel>
1. walk_system_ram_range (<kernel>) <kernel>
2. pat_pagerange_is_ram (<kernel>) <kernel>
3. memtype_free (<kernel>) <kernel>
4. set_pages_array_wb (<kernel>) <kernel>
5. nv_free_system_pages (<kernel>) nvidia
6. nv_free_pages (<kernel>) nvidia
7. osFreePagesInternal (<kernel>) nvidia
8. memdescFree (<kernel>) nvidia
9. memDestruct_IMPL (<kernel>) nvidia
10. __nvoc_dtor_Memory (<kernel>) nvidia
11. __nvoc_objDelete (<kernel>) nvidia
12. clientFreeResource_IMPL (<kernel>) nvidia
13. rmclientFreeResource_IMPL (<kernel>) nvidia
14. serverFreeResourceTreeUnderLock (<kernel>) nvidia
15. serverFreeResourceTree (<kernel>) nvidia
16. rmapiFreeWithSecInfo (<kernel>) nvidia
17. rmapiFreeWithSecInfoTls (<kernel>) nvidia
18. Nv01FreeWithSecInfo (<kernel>) nvidia
19. RmIoctl (<kernel>) nvidia
20. rm_ioctl (<kernel>) nvidia
21. nvidia_unlocked_ioctl (<kernel>) nvidia
22. __x64_sys_ioctl (<kernel>) <kernel>
23. do_syscall_64 (<kernel>) <kernel>
24. entry_SYSCALL_64_after_hwframe (<kernel>) <kernel>
25. ioctl + 61 ([unknown]) /usr/lib/libc.so.6
26. [unknown] + 129151784076160 ([unknown]) /usr/lib/libnvidia-glcore.so.570.86.16
```
Here's a similar capture for llvmpipe software renderer:
<img alt="Image" src="https://github.com/user-attachments/assets/3a83308e-b067-4d53-92ea-bcc83a23334a" />
For the Intel driver:
<img alt="Image" src="https://github.com/user-attachments/assets/067e6776-322e-429d-9595-c4ba9bda9a92" />
And for the AMD driver, which never drops the pace of frame updates:
<img alt="Image" src="https://github.com/user-attachments/assets/d245fd9b-4f8c-4263-bd0a-c100895243af" />
With the Nvidia driver, the memory allocation seems to be blocking all other rendering operations running on the system, not limited to the program that is allocating memory. I can see that with the movement of the mouse, and I can show this with the following video:
https://github.com/user-attachments/assets/4fb12c97-87e1-495e-ae62-8e4bc5493510
In the background there's some video playing in mpv. The first run of my application uses the Nvidia driver, and you can see that it stutters twice, with timing generally matching what the profile captures above showed. The second run is rendered with llvmpipe, and you can see the hourglass animation freeze for a moment, before the image is shown, as there is no separate transfer queue with this driver. The llvmpipe scenario never freezes the video running in the background.
Hello,
Lately there seems to be a very serious VRAM Allocation Issue with dxvk/vkd3d and even native vulkan applications.
It seems to push the VRAM usage towards it’s absolute limit(much much higher than anything on windows) and even enabling DLSS(which if you compare windows to linux it does NOT free up VRAM).
To provide as much details as possible, firstly as a start the following games seem to be highly affected:
Star Wars Jedi Fallen Order, Ready Or Not, Doom Eternal, Gears 5, COD WWII.
…
These have already been mentioned here before but i’m linking them again just for reference… (and hopefully for Nvidia to take another look at them since they’ve been a lot more active lately)
I’m still not sure though if these are two different issues or the same/related issue(s)… 🤷🏻
5 Likes