Driver hang during shutdown of Unreal Engine 4.21 with Vulkan RHI

Recently I’ve been experiencing hangs while closing Unreal Editor or a game using the engine with the Vulkan RHI. The hangs happen about 1/3rd of the time when the engine shuts down. I haven’t noticed any difference in activity before a hang compared to when they don’t occur, but its been quite easy to trigger.

When one of these hangs occurs it becomes impossible to start or kill any graphics related process or to shutdown the system by software, requiring a hard reset.

GTX 1080
Driver 415.22
Linux 4.19.4-arch1-1-ARCH
nvidia-bug-report.log.gz (74 KB)
unreal-engine-4.21-backtrace.log (9.26 KB)

Hi,

Thanks for writing to us on subjected issue.
We have filed a case 2461384 with NVIDIA for tracking purpose and will attempt to reproduce issue internally.

Would like to know if you have started encountering this issue with 415.22 or if it’s something that’s new with the new UE4 Vulkan RenderingHardwareInterface.

Hi, thanks for the reply. I don’t recall if this started with a driver update or engine update, I think I had done both around the same time. Tomorrow I can try different combinations of older versions to see if that can help narrow things down a bit.

I tested several driver versions going back as far as early July and saw hangs with all of them. I also tried to reproduce on another system but was unable to. This other system has roughly the same software configuration, but older hardware (GTX 660m)

I’m also not quite sure how relevant the backtrace I attached is anymore, as during testing I saw the same error in the logs even when a hang didn’t occur.

Please let me know if there’s any other information that would be helpful.

Hi,

Thanks for the information. OS Installation is completed, configuration of UnrealEngine-release is in progress. Once it is completed, will proceed for reproducing internally and will update accordingly.

I downloaded Unreal Engine source code from git and was following URL (Building Unreal Engine from Source | Unreal Engine 5.0 Documentation) to install it but resulted into below error while compilation. Please advise if you have used the same source code, if different, please share the path from where i can download it.
Moreover, will appreciate if you can provide detailed repro steps for the same.

[root@archlinux UnrealEngine-release]# make bash “/root/UnrealEngine-release/Engine/Build/BatchFiles/Linux/Build.sh” CrashReportClient Linux Shipping Fixing inconsistent case in filenames.
which: no mono in (/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl)
Setting up Mono
Building CrashReportClient…
Using ‘git status’ to determine working set for adaptive non-unity build (/root/UnrealEngine-release).
Creating makefile for CrashReportClient (no existing makefile) Performing full C++ include scan (no include cache file) Building UnrealHeaderTool…
Using ‘git status’ to determine working set for adaptive non-unity build (/root/UnrealEngine-release).
Target is up to date
Total build time: 0.16 seconds (NoActionsToExecute executor: 0.00 seconds) Parsing headers for CrashReportClient
Running UnrealHeaderTool CrashReportClient “/root/UnrealEngine-release/Engine/Intermediate/Build/Linux/B4D820EA/CrashReportClient/Shipping/CrashReportClient.uhtmanifest” -LogCmds=“loginit warning, logexit warning, logdatabase error” -Unattended -WarningsAsErrors -NoEnginePlugins Refusing to run with the root privileges.
Error: UnrealHeaderTool failed for target ‘CrashReportClient’ (platform: Linux, module info: /root/UnrealEngine-release/Engine/Intermediate/Build/Linux/B4D820EA/CrashReportClient/Shipping/CrashReportClient.uhtmanifest, exit code: Canceled (1)).
make: *** [Makefile:197: CrashReportClient-Linux-Shipping] Error 1 [root@archlinux UnrealEngine-release]#

We’ll need more details to reproduce the problem.
Did you build the engine using the AUR package at AUR (en) - unreal-engine ?
Once it’s built, what do you do exactly to reproduce the problem?
And does it happen with older versions of UE4?
Thanks

I hadn’t thought about this until just now, I’ve been building the engine in an Ubuntu based chroot to provide a stable build environment. Building directly on Arch would be easiest via the AUR package, as there are patches that would need to be applied.

Simply starting the editor, waiting until it fully loads (which may take a few minutes the first time) and then closing it is enough to trigger the issue for me. This may need repeating a few times, as the hang doesn’t always occur. Based on what I’ve been experiencing I’d give it at least 10 tries before declaring it unreproducible on the system.

I don’t recall experiencing this with older versions of the engine, but I had only briefly used the Vulkan RHI with them. If it would be helpful, I can build an older version and try to reproduce with it.

I’m including a link to a minimal project I’ve build with the engine. For me, running the included “Minimal.sh” triggers the hangs in the same manner as the editor does.

Thanks

We’re not reproducing the problem with the minimal project, on different systems.
Can you please install dmidecode, reproduce the problem again, and run nvidia-bug-report.sh again?
Can you please check if you can reproduce the problem on older/newer NVIDIA driver versions?
Thanks

Here’s the new nvidia-bug-report.log.gz after installing dmidecode and updating the kernel and drivers. I was able reproduce on driver versions going back to a July release (396.45 I think). I wasn’t able to test back further than that due to some version mismatch causing the system to fallback to the integrated Intel graphics.
nvidia-bug-report.log.gz (76 KB)

Thanks for sharing logs. I still have few queries regarding setup, please clarify on the same.

As per logs, it looks like your are using 2 displays; one from Intel and other from NVIDIA GPU, please confirm the same.

Secondly, I can see from below configuration that you are using both Intel and Nvidia drivers.
*** /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf
*** ls: -rw-r–r-- 1 root root 362 2018-12-18 16:35:58.000000000 -0500 /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf
Section “OutputClass”
Identifier “intel”
MatchDriver “i915”
Driver “modesetting”
EndSection

Section “OutputClass”
Identifier “nvidia”
MatchDriver “nvidia-drm”
Driver “nvidia”
Option “AllowEmptyInitialConfiguration”
Option “PrimaryGPU” “yes”
ModulePath “/usr/lib/nvidia/xorg”
ModulePath “/usr/lib/xorg/modules”
EndSection

Could you please regenerate X config file using nvidia-xconfig command and use the same to run X serer or GUI.
Please share results after trying above change.

You are correct, there are two displays connected to the system, one through the Nvidia GPU the other through the motherboard and Intel chip. After running nvidia-xconfig I can no longer reproduce the issue, however I can also no longer use the second display.
nvidia-bug-report.log.gz (1.03 MB)

@maiself
Please help to confirm if you are still seeing this issue with latest UE and nvidia driver combination.

@maiself
Please help to confirm if you are still seeing this issue with latest UE and nvidia driver combination.

This was a long time ago now. I believe it was fixed a few months after the original issue, I remember seeing a brief mention in the driver’s release logs about it. I’m currently not able to test to see if things are still working correctly.

Thanks for the update, I will close the bug internally and if by any chance, you come across issue, please feel free to report here.